Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 102 tok/s
Gemini 2.5 Pro 58 tok/s Pro
GPT-5 Medium 25 tok/s
GPT-5 High 35 tok/s Pro
GPT-4o 99 tok/s
GPT OSS 120B 472 tok/s Pro
Kimi K2 196 tok/s Pro
2000 character limit reached

SPIRAL: A superlinearly convergent incremental proximal algorithm for nonconvex finite sum minimization (2207.08195v2)

Published 17 Jul 2022 in math.OC and cs.LG

Abstract: We introduce SPIRAL, a SuPerlinearly convergent Incremental pRoximal ALgorithm, for solving nonconvex regularized finite sum problems under a relative smoothness assumption. Each iteration of SPIRAL consists of an inner and an outer loop. It combines incremental gradient updates with a linesearch that has the remarkable property of never being triggered asymptotically, leading to superlinear convergence under mild assumptions at the limit point. Simulation results with L-BFGS directions on different convex, nonconvex, and non-Lipschitz differentiable problems show that our algorithm, as well as its adaptive variant, are competitive to the state of the art.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. A Bregman forward-backward linesearch algorithm for nonconvex composite optimization: Superlinear convergence to nonisolated local minima. SIAM Journal on Optimization, 31(1):653–685, 2021.
  2. Local convergence of quasi-newton methods under metric regularity. Computational Optimization and Applications, 58(1):225–247, 2014.
  3. A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Mathematics of Operations Research, 42(2):330–348, 2017.
  4. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
  5. Yoshua Bengio. Practical recommendations for gradient-based training of deep architectures. Neural Networks: Tricks of the Trade: Second Edition, pages 437–478, 2012.
  6. Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, 2016.
  7. Gradient convergence in gradient methods with errors. SIAM Journal on Optimization, 10(3):627–642, 2000.
  8. A convergent incremental gradient method with a constant step size. SIAM Journal on Optimization, 18(1):29–51, 2007.
  9. The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM Journal on Optimization, 17(4):1205–1223, 2007.
  10. Clarke subgradients of stratifiable functions. SIAM Journal on Optimization, 18(2):556–572, 2007.
  11. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146(1-2):459–494, 2014.
  12. First order methods beyond convexity and lipschitz gradient continuity with applications to quadratic inverse problems. SIAM Journal on Optimization, 28(3):2131–2151, 2018.
  13. Empirical risk minimization with shuffled sgd: A primal-dual perspective and improved bounds. arXiv preprint arXiv:2306.12498, 2023.
  14. Cyclic block coordinate descent with variance reduction for composite nonconvex optimization. In International Conference on Machine Learning, pages 3469–3494. PMLR, 2023.
  15. Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Transactions on Information Theory, 61(4):1985–2007, 2015.
  16. Tighter lower bounds for shuffling SGD: Random permutations and beyond. In International Conference on Machine Learning, pages 3855–3912. PMLR, 23–29 Jul 2023.
  17. LIBSVM: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2:1–27, 2011.
  18. Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM Journal on Optimization, 3(3):538–543, 1993.
  19. Stochastic model-based minimization under high-order growth. arXiv preprint arXiv:1807.00255, 2018.
  20. Proximal gradient algorithms under local lipschitz gradient continuity: A convergence and robustness analysis of panoc. Journal of Optimization Theory and Applications, 194(3):771–794, 2022.
  21. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in neural information processing systems, pages 1646–1654, 2014.
  22. Finito: A faster, permutable incremental gradient method for big data problems. In International Conference on Machine Learning, pages 1125–1133, 2014.
  23. A characterization of superlinear convergence and its application to quasi-newton methods. Mathematics of computation, 28(126):549–560, 1974.
  24. John E Dennis, Jr and Jorge J Moré. Quasi-newton methods, motivation and theory. SIAM review, 19(1):46–89, 1977.
  25. Fast stochastic bregman gradient methods: Sharp analysis and variance reduction. In International Conference on Machine Learning, pages 2815–2825. PMLR, 2021.
  26. Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Information and Inference: A Journal of the IMA, 8(3):471–529, 2019.
  27. Finite-dimensional variational inequalities and complementarity problems, volume II. Springer, 2003.
  28. Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems, 31, 2018.
  29. Stabilized svrg: Simple variance reduction for nonconvex optimization. In Conference on learning theory, pages 1394–1448. PMLR, 2019.
  30. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
  31. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming, 155(1-2):267–305, 2016.
  32. Why random reshuffling beats stochastic gradient descent. Mathematical Programming, 186:49–84, 2021.
  33. Fastest rates for stochastic mirror descent methods. Computational Optimization and Applications, pages 1–50, 2021.
  34. Random shuffling beats sgd after finite epochs. In International Conference on Machine Learning, pages 2624–2633. PMLR, 2019.
  35. The Elements of Statistical Learning. Springer New York, 2001.
  36. Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems, 26:315–323, 2013.
  37. The Moreau envelope function and proximal mapping in the sense of the Bregman distance. Nonlinear Analysis: Theory, Methods & Applications, 75(3):1385 – 1399, 2012.
  38. Krzysztof Kurdyka. On gradients of functions definable in o𝑜oitalic_o-minimal structures. Annales de l’institut Fourier, 48(3):769–783, 1998.
  39. Bregman finito/miso for nonconvex regularized finite sum minimization without lipschitz gradient continuity. SIAM Journal on Optimization, 32(3):2230–2262, 2022.
  40. Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems. Mathematical Programming, pages 1–30, 2021.
  41. ZeroSARAH: Efficient nonconvex finite-sum optimization with zero full gradient computation. arXiv preprint arXiv:2103.01447, 2021.
  42. Relatively smooth convex optimization by first-order methods, and applications. SIAM Journal on Optimization, 28(1):333–354, 2018.
  43. Julien Mairal. Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM Journal on Optimization, 25(2):829–855, 2015.
  44. Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems, 33:17309–17320, 2020.
  45. IQN: An incremental quasi-Newton method with local superlinear convergence rate. SIAM Journal on Optimization, 28(2):1670–1698, 2018.
  46. Surpassing gradient descent provably: A cyclic incremental method with linear convergence rate. SIAM Journal on Optimization, 28(2):1420–1447, 2018.
  47. A linearly-convergent stochastic L-BFGS algorithm. In Artificial Intelligence and Statistics, pages 249–258. PMLR, 2016.
  48. On stochastic subgradient mirror-descent algorithm with weighted averaging. SIAM Journal on Optimization, 24(1):84–107, 2014.
  49. Yu. Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140(1):125–161, aug 2013.
  50. Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 137. Springer Science & Business Media, 2018.
  51. SARAH: A novel method for machine learning problems using stochastic recursive gradient. In International Conference on Machine Learning, pages 2613–2621. PMLR, 2017.
  52. Alpaqa: A matrix-free solver for nonlinear mpc and large-scale nonconvex optimization. In 2022 European Control Conference (ECC), pages 417–422. IEEE, 2022.
  53. ProxSARAH: An efficient algorithmic framework for stochastic composite nonconvex optimization. J. Mach. Learn. Res., 21:110–1, 2020.
  54. Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation, 5(2):201–226, 2013.
  55. Stochastic variance reduction for nonconvex optimization. In International conference on machine learning, pages 314–323, 2016.
  56. Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In Advances in Neural Information Processing Systems, pages 1145–1153, 2016.
  57. Variational analysis, volume 317. Springer Science & Business Media, 2009.
  58. Ralph Tyrell Rockafellar. Convex analysis. Princeton University Press, 1970.
  59. A superlinearly-convergent proximal Newton-type method for the optimization of finite sums. In International Conference on Machine Learning, pages 2597–2605. PMLR, 2016.
  60. Hybrid acceleration scheme for variance reduced stochastic optimization algorithms. arXiv preprint arXiv:2111.06791, 2021.
  61. Minimizing finite sums with the stochastic average gradient. Mathematical Programming, 162(1):83–112, mar 2017.
  62. Stochastic dual coordinate ascent methods for regularized loss minimization. Journal of Machine Learning Research, 14(Feb):567–599, 2013.
  63. An inexact hybrid generalized proximal point algorithm and some new results on the theory of Bregman functions. Math. Oper. Res., 25(2):214–230, 2000.
  64. A geometric analysis of phase retrieval. Foundations of Computational Mathematics, 18(5):1131–1198, 2018.
  65. Marc Teboulle. A simplified view of first order methods for optimization. Mathematical Programming, 170(1):67–96, 2018.
  66. A. Themelis and P. Patrinos. Supermann: A superlinearly convergent algorithm for finding fixed points of nonexpansive operators. IEEE Transactions on Automatic Control, 64(12):4875–4890, dec 2019.
  67. On the acceleration of forward-backward splitting via an inexact Newton method. In Heinz H. Bauschke, Regina S. Burachik, and D. Russell Luke, editors, Splitting Algorithms, Modern Operator Theory, and Applications, pages 363–412. Springer International Publishing, Cham, 2019.
  68. Forward-backward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms. SIAM Journal on Optimization, 28(3):2274–2303, 2018.
  69. Global convergence rate of proximal incremental aggregated gradient methods. SIAM Journal on Optimization, 28(2):1282–1300, 2018.
  70. Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems, 32, 2019.
  71. A stochastic extra-step quasi-Newton method for nonsmooth nonconvex optimization. Mathematical Programming, pages 1–47, 2021.
  72. Kurdyka-Łojasiewicz exponent via inf-projection. Foundations of Computational Mathematics, pages 1–47, 2021.
  73. Proximal-like incremental aggregated gradient method with linear convergence under Bregman distance growth conditions. Mathematics of Operations Research, 46(1):61–81, 2021.
  74. Variance-reduced stochastic quasi-Newton methods for decentralized learning: Part I, 2022.
Citations (2)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube