Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-concordant Smoothing for Large-Scale Convex Composite Optimization (2309.01781v2)

Published 4 Sep 2023 in math.OC and cs.LG

Abstract: We introduce a notion of self-concordant smoothing for minimizing the sum of two convex functions, one of which is smooth and the other may be nonsmooth. The key highlight of our approach is in a natural property of the resulting problem's structure which provides us with a variable-metric selection method and a step-length selection rule particularly suitable for proximal Newton-type algorithms. In addition, we efficiently handle specific structures promoted by the nonsmooth function, such as $\ell_1$-regularization and group-lasso penalties. We prove the convergence of two resulting algorithms: Prox-N-SCORE, a proximal Newton algorithm and Prox-GGN-SCORE, a proximal generalized Gauss-Newton algorithm. The Prox-GGN-SCORE algorithm highlights an important approximation procedure which helps to significantly reduce most of the computational overhead associated with the inverse Hessian. This approximation is essentially useful for overparameterized machine learning models and in the mini-batch settings. Numerical examples on both synthetic and real datasets demonstrate the efficiency of our approach and its superiority over existing approaches. A Julia package implementing the proposed algorithms is available at https://github.com/adeyemiadeoye/SelfConcordantSmoothOptimization.jl.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Smoothing and first order methods: A unified framework. SIAM Journal on Optimization, 22(2):557–580, 2012.
  2. Antonin Chambolle. An algorithm for total variation minimization and applications. Journal of Mathematical imaging and vision, 20:89–97, 2004.
  3. Gradient-based algorithms with applications to signal recovery. Convex optimization in signal processing and communications, pages 42–88, 2009.
  4. David L Donoho. Compressed sensing. IEEE Transactions on information theory, 52(4):1289–1306, 2006.
  5. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52(2):489–509, 2006.
  6. The explicit linear quadratic regulator for constrained systems. Automatica, 38(1):3–20, 2002.
  7. Real-time input-constrained MPC using fast gradient methods. In Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, pages 7387–7393. IEEE, 2009.
  8. A simple and efficient algorithm for nonlinear model predictive control. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 1939–1944. IEEE, 2017.
  9. Alberto Bemporad. Recurrent neural network training with convex loss and regularization functions by extended Kalman filtering. IEEE Transactions on Automatic Control, 2023.
  10. Yurii Nesterov. A method for unconstrained convex minimization problem with the rate of convergence O (1/k^2). In Dokl. Akad. Nauk. SSSR, volume 269, pages 543–547, 1983.
  11. Yu Nesterov. Smooth minimization of non-smooth functions. Mathematical programming, 103:127–152, 2005.
  12. Paul Tseng. On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal on Optimization, 2(3), 2008.
  13. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.
  14. Proximal Newton methods for convex composite optimization. In 52nd IEEE Conference on Decision and Control, pages 2358–2363. IEEE, 2013.
  15. A quasi-Newton proximal splitting method. Advances in neural information processing systems, 25, 2012.
  16. Proximal Newton-type methods for minimizing composite functions. SIAM Journal on Optimization, 24(3):1420–1443, 2014.
  17. Forward-backward truncated Newton methods for convex composite optimization. arXiv preprint arXiv:1402.6655, 2014.
  18. Composite self-concordant minimization. J. Mach. Learn. Res., 16(1):371–416, 2015.
  19. Forward–backward quasi-Newton methods for nonsmooth optimization problems. Computational Optimization and Applications, 67(3):443–487, 2017.
  20. SCORE: approximating curvature information under self-concordant regularization. Computational Optimization and Applications, 86(2):599–626, 2023.
  21. Epi-convergent smoothing with applications to convex composite functions. SIAM Journal on Optimization, 23(3):1457–1479, 2013.
  22. Epi-convergence properties of smoothing by infimal convolution. Set-Valued and Variational Analysis, 25:1–23, 2017.
  23. Xiaojun Chen. Smoothing methods for nonsmooth, nonconvex minimization. Mathematical programming, 134:71–99, 2012.
  24. Interior-point polynomial algorithms in convex programming. SIAM, 1994.
  25. Generalized self-concordant functions: a recipe for Newton-type methods. Mathematical Programming, 178(1-2):145–213, 2019.
  26. Francis Bach. Self-concordant analysis for logistic regression. Electronic Journal of Statistics, 4(none):384 – 414, 2010.
  27. Finite-sample analysis of M𝑀Mitalic_M-estimators using self-concordance. Electronic Journal of Statistics, 15(1):326 – 391, 2021.
  28. Global and superlinear convergence of the smoothing Newton method and its application to general box constrained variational inequalities. Mathematics of computation, 67(222):519–540, 1998.
  29. R Tyrrell Rockafellar and Roger J-B Wets. Variational analysis, volume 317. Springer Science & Business Media, 2009.
  30. Fundamentals of convex analysis. Grundlehren Text Editions. Springer Science & Business Media, 2004.
  31. R Tyrrell Rockafellar. Monotone operators and the proximal point algorithm. SIAM journal on control and optimization, 14(5):877–898, 1976.
  32. Legendre functions and the method of random Bregman projections. Journal of convex analysis, 4(1):27–67, 1997.
  33. A relaxed version of Bregman’s method for convex programming. Journal of Optimization Theory and Applications, 51:421–440, 1986.
  34. A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Mathematics of Operations Research, 42(2):330–348, 2017.
  35. Convex analysis and monotone operator theory in Hilbert spaces, volume 408. Springer, 2011.
  36. Yves Lucet. Faster than the fast Legendre transform, the linear-time Legendre transform. Numerical Algorithms, 16:171–185, 1997.
  37. Thomas Strömberg. A study of the operation of infimal convolution. PhD thesis, Luleå Tekniska Universitet, 1994.
  38. Michael Patriksson. Cost approximation: a unified framework of descent algorithms for nonlinear programs. SIAM Journal on Optimization, 8(2):561–582, 1998.
  39. Michael Patriksson. A unified framework of descent algorithms for nonlinear programs and variational inequalities. PhD thesis, Linköping University Linköping, Sweden, 1993.
  40. Proximal algorithms. Foundations and Trends® in Optimization, 1(3):127–239, 2014.
  41. On quasi-Newton forward-backward splitting: proximal calculus and convergence. SIAM Journal on Optimization, 29(4):2445–2481, 2019.
  42. Graph-structured multi-task regression and an efficient optimization method for general fused lasso. arXiv preprint arXiv:1005.3579, 2010.
  43. Smoothing proximal gradient method for general structured sparse regression. 2012.
  44. A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736, 2010.
  45. A sparse-group lasso. Journal of computational and graphical statistics, 22(2):231–245, 2013.
  46. A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics, 25(12):i204–i212, 2009.
  47. Yurii Nesterov. Barrier subgradient method. Mathematical programming, 127(1):31–56, 2011.
  48. Barrier smoothing for nonsmooth convex minimization. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1503–1507. IEEE, 2014.
  49. Yao-Liang Yu. On decomposing the proximal map. Advances in neural information processing systems, 26, 2013.
  50. Defeng Sun. The strong second-order sufficient condition and constraint nondegeneracy in nonlinear semidefinite programming and their implications. Mathematics of Operations Research, 31(4):761–776, 2006.
  51. Yurii Nesterov. Lectures on convex optimization, volume 137. Springer, Cham, 2018.
  52. Cubic regularization of Newton method and its global performance. Mathematical Programming, 108(1):177–205, 2006.
  53. James V. Burke and RA Poliquin. Optimality conditions for non-finite valued convex composite functions. Mathematical Programming, 57(1-3):103–120, 1992.
  54. Numerical optimization. Springer, New York, NY, 1999.
  55. LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3):1–27, 2011.
  56. Forward-backward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms. SIAM Journal on Optimization, 28(3):2274–2303, 2018.
  57. Scalable training of ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized log-linear models.
  58. Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis, 16(6):964–979, 1979.
  59. Gap safe screening rules for sparsity enforcing penalties. The Journal of Machine Learning Research, 18(1):4671–4703, 2017.
  60. A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems. SIAM Journal on Optimization, 28(1):433–458, 2018.
  61. An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems. Mathematical Programming, 179:223–263, 2020.
  62. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1):1, 2010.
  63. Fast sparse group lasso. Advances in neural information processing systems, 32, 2019.
  64. Two-layer feature reduction for sparse-group lasso via decomposition of convex sets. Advances in Neural Information Processing Systems, 27, 2014.
  65. Strong rules for discarding predictors in lasso-type problems. Journal of the Royal Statistical Society Series B: Statistical Methodology, 74(2):245–266, 2012.
  66. Gap safe screening rules for sparse-group lasso. Advances in neural information processing systems, 29, 2016.
  67. Sparse signal estimation by maximally sparse convex optimization. IEEE Transactions on Signal Processing, 62(5):1078–1092, 2014.
Citations (1)

Summary

We haven't generated a summary for this paper yet.