Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Shuffling Momentum Gradient Algorithm for Convex Optimization (2403.03180v1)

Published 5 Mar 2024 in math.OC and cs.LG

Abstract: The Stochastic Gradient Descent method (SGD) and its stochastic variants have become methods of choice for solving finite-sum optimization problems arising from machine learning and data science thanks to their ability to handle large-scale applications and big datasets. In the last decades, researchers have made substantial effort to study the theoretical performance of SGD and its shuffling variants. However, only limited work has investigated its shuffling momentum variants, including shuffling heavy-ball momentum schemes for non-convex problems and Nesterov's momentum for convex settings. In this work, we extend the analysis of the shuffling momentum gradient method developed in [Tran et al (2021)] to both finite-sum convex and strongly convex optimization problems. We provide the first analysis of shuffling momentum-based methods for the strongly convex setting, attaining a convergence rate of $O(1/nT2)$, where $n$ is the number of samples and $T$ is the number of training epochs. Our analysis is a state-of-the-art, matching the best rates of existing shuffling stochastic gradient algorithms in the literature.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press (2012) Bottou et al. [2018] Bottou, L., Curtis, F.E., Nocedal, J.: Optimization Methods for Large-Scale Machine Learning. SIAM Rev. 60(2), 223–311 (2018) Robbins and Monro [1951] Robbins, H., Monro, S.: A stochastic approximation method. The Annals of Mathematical Statistics 22(3), 400–407 (1951) Duchi et al. [2011] Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121–2159 (2011) Kingma and Ba [2014] Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) abs/1412.6980 (2014) Nguyen et al. [2018] Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L., Curtis, F.E., Nocedal, J.: Optimization Methods for Large-Scale Machine Learning. SIAM Rev. 60(2), 223–311 (2018) Robbins and Monro [1951] Robbins, H., Monro, S.: A stochastic approximation method. The Annals of Mathematical Statistics 22(3), 400–407 (1951) Duchi et al. [2011] Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121–2159 (2011) Kingma and Ba [2014] Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) abs/1412.6980 (2014) Nguyen et al. [2018] Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Robbins, H., Monro, S.: A stochastic approximation method. The Annals of Mathematical Statistics 22(3), 400–407 (1951) Duchi et al. [2011] Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121–2159 (2011) Kingma and Ba [2014] Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) abs/1412.6980 (2014) Nguyen et al. [2018] Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121–2159 (2011) Kingma and Ba [2014] Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) abs/1412.6980 (2014) Nguyen et al. [2018] Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) abs/1412.6980 (2014) Nguyen et al. [2018] Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  2. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization Methods for Large-Scale Machine Learning. SIAM Rev. 60(2), 223–311 (2018) Robbins and Monro [1951] Robbins, H., Monro, S.: A stochastic approximation method. The Annals of Mathematical Statistics 22(3), 400–407 (1951) Duchi et al. [2011] Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121–2159 (2011) Kingma and Ba [2014] Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) abs/1412.6980 (2014) Nguyen et al. [2018] Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Robbins, H., Monro, S.: A stochastic approximation method. The Annals of Mathematical Statistics 22(3), 400–407 (1951) Duchi et al. [2011] Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121–2159 (2011) Kingma and Ba [2014] Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) abs/1412.6980 (2014) Nguyen et al. [2018] Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121–2159 (2011) Kingma and Ba [2014] Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) abs/1412.6980 (2014) Nguyen et al. [2018] Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) abs/1412.6980 (2014) Nguyen et al. [2018] Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  3. Robbins, H., Monro, S.: A stochastic approximation method. The Annals of Mathematical Statistics 22(3), 400–407 (1951) Duchi et al. [2011] Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121–2159 (2011) Kingma and Ba [2014] Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) abs/1412.6980 (2014) Nguyen et al. [2018] Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121–2159 (2011) Kingma and Ba [2014] Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) abs/1412.6980 (2014) Nguyen et al. [2018] Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) abs/1412.6980 (2014) Nguyen et al. [2018] Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  4. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121–2159 (2011) Kingma and Ba [2014] Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) abs/1412.6980 (2014) Nguyen et al. [2018] Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) abs/1412.6980 (2014) Nguyen et al. [2018] Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  5. Kingma, D.P., Ba, J.: ADAM: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) abs/1412.6980 (2014) Nguyen et al. [2018] Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  6. Nguyen, L., Nguyen, P.H., Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and Hogwild! convergence without the bounded gradients assumption. In: Proceedings of the 35th International Conference on Machine Learning-Volume 80, pp. 3747–3755 (2018) Bottou [2009] Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  7. Bottou, L.: Curiously fast convergence of some stochastic gradient descent algorithms. In: Proceedings of the Symposium on Learning and Data Science, Paris, vol. 8, pp. 2624–2633 (2009) Gürbüzbalaban et al. [2019] Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  8. Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: Why random reshuffling beats stochastic gradient descent. Mathematical Programming (2019) https://doi.org/10.1007/s10107-019-01440-w Haochen and Sra [2019] Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  9. Haochen, J., Sra, S.: Random shuffling beats sgd after finite epochs. In: International Conference on Machine Learning, pp. 2624–2633 (2019). PMLR Safran and Shamir [2020] Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  10. Safran, I., Shamir, O.: How good is sgd with random shuffling? In: Conference on Learning Theory, pp. 3250–3284 (2020). PMLR Nagaraj et al. [2019] Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  11. Nagaraj, D., Jain, P., Netrapalli, P.: Sgd without replacement: Sharper rates for general smooth convex functions. In: International Conference on Machine Learning, pp. 4703–4711 (2019) Rajput et al. [2020] Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  12. Rajput, S., Gupta, A., Papailiopoulos, D.: Closing the convergence gap of sgd without replacement. In: International Conference on Machine Learning, pp. 7964–7973 (2020). PMLR Nguyen et al. [2021] Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  13. Nguyen, L.M., Tran-Dinh, Q., Phan, D.T., Nguyen, P.H., Van Dijk, M.: A unified convergence analysis for shuffling-type gradient methods. J. Mach. Learn. Res. 22(1) (2021) Mishchenko et al. [2020] Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  14. Mishchenko, K., Khaled Ragab Bayoumi, A., Richtárik, P.: Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems 33 (2020) Ahn et al. [2020] Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  15. Ahn, K., Yun, C., Sra, S.: Sgd with shuffling: optimal rates without component convexity and large epoch requirements. arXiv preprint arXiv:2006.06946 (2020) Reddi et al. [2019] Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  16. Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 (2019) Hu et al. [2009] Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  17. Hu, C., Pan, W., Kwok, J.: Accelerated Gradient Methods for Stochastic Optimization and Online Learning. Curran Associates, Inc. (2009). https://proceedings.neurips.cc/paper/2009/file/ec5aa0b7846082a2415f0902f0da88f2-Paper.pdf Cotter et al. [2011] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  18. Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better Mini-Batch Algorithms via Accelerated Gradient Methods. Curran Associates, Inc. (2011). https://proceedings.neurips.cc/paper/2011/file/b55ec28c52d5f6205684a473a2193564-Paper.pdf Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  19. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever et al. [2013] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  20. Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1139–1147. PMLR, Atlanta, Georgia, USA (2013). https://proceedings.mlr.press/v28/sutskever13.html Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  21. Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. Dokl. Nesterov [2004] Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  22. Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers (2004) Yuan et al. [2016] Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  23. Yuan, K., Ying, B., Sayed, A.H.: On the influence of momentum acceleration on online learning. Journal of Machine Learning Research 17(192), 1–66 (2016) Jin et al. [2022] Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  24. Jin, R., Xing, Y., He, X.: On the Convergence of mSGD and AdaGrad for Stochastic Optimization (2022) Liu et al. [2019] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  25. Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019) Tran et al. [2022] Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  26. Tran, T.H., Scheinberg, K., Nguyen, L.M.: Nesterov Accelerated Shuffling Gradient Method for Convex Optimization. PMLR (2022). https://proceedings.mlr.press/v162/tran22a.html Bottou [2012] Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  27. Bottou, L.: Stochastic gradient descent tricks. Springer (2012) Recht and Ré [2011] Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  28. Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5 (2011) https://doi.org/10.1007/s12532-013-0053-8 Nedić and Bertsekas [2001] Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  29. Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. Stochastic optimization: algorithms and applications, 223–264 (2001) Nedic and Bertsekas [2001] Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  30. Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM (2001) Shamir [2016] Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  31. Shamir, O.: Without-replacement sampling for stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp. 46–54 (2016) Le Roux et al. [2012] Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  32. Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2663–2671 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  33. Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  34. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  35. Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621 (2017). JMLR. org Nguyen and Tran [2023] Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  36. Nguyen, L.M., Tran, T.H.: On the convergence to a global solution of shuffling-type gradient algorithms. The 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023) Li et al. [2021] Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  37. Li, X., Zhuang, Z., Orabona, F.: A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning, pp. 6553–6564 (2021) Loshchilov and Hutter [2017] Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  38. Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Chang and Lin [2011] Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  39. Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–12727 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Polyak [1964] Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964) Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
  40. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)

Summary

We haven't generated a summary for this paper yet.