Linearization Algorithms for Fully Composite Optimization (2302.12808v2)
Abstract: This paper studies first-order algorithms for solving fully composite optimization problems over convex and compact sets. We leverage the structure of the objective by handling its differentiable and non-differentiable components separately, linearizing only the smooth parts. This provides us with new generalizations of the classical Frank-Wolfe method and the Conditional Gradient Sliding algorithm, that cater to a subclass of non-differentiable problems. Our algorithms rely on a stronger version of the linear minimization oracle, which can be efficiently implemented in several practical applications. We provide the basic version of our method with an affine-invariant analysis and prove global convergence rates for both convex and non-convex objectives. Furthermore, in the convex case, we propose an accelerated method with correspondingly improved complexity. Finally, we provide illustrative experiments to support our theoretical results.
- Hybrid conditional gradient-smoothing algorithms with applications to sparse and low rank regularization. Regularization, Optimization, Kernels, and Support Vector Machines, pages 53–82, 2014.
- A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
- The multiproximal linearization method for convex composite problems. Mathematical Programming, 182(1):1–36, 2020.
- New constraint qualification and conjugate duality for composed convex optimization problems. Journal of Optimization Theory and Applications, 135:241–255, 2007.
- A new constraint qualification for the formula of the subdifferential of composed convex functions in infinite dimensional spaces. Mathematische Nachrichten, 281(8):1088–1107, 2008.
- Convex optimization. Cambridge university press, 2004.
- Conditional gradient methods. arXiv preprint arXiv:2211.14103, 2022.
- James V Burke. Descent methods for composite nondifferentiable optimization problems. Mathematical Programming, 33(3):260–279, 1985.
- James V Burke. Second order necessary and sufficient conditions for convex composite ndo. Mathematical programming, 38:287–302, 1987.
- A Gauss—Newton method for convex composite optimization. Mathematical Programming, 71(2):179–194, 1995.
- A study of convex convex-composite functions via infimal convolution with applications. Mathematics of Operations Research, 46(4):1324–1348, 2021.
- Parameter-free locally accelerated conditional gradients. arXiv preprint arXiv:2102.06806, 2021.
- Accelerating frank-wolfe via averaging step directions. arXiv preprint arXiv:2205.11794, 2022.
- Boosting frank-wolfe by chasing gradients. In International Conference on Machine Learning, pages 2111–2121. PMLR, 2020.
- Complexity of linear minimization and projection on some sets. Operations Research Letters, 2021.
- Composite difference-max programs for modern statistical estimation problems. SIAM Journal on Optimization, 28(4):3344–3374, 2018.
- Multi-objective bayesian optimization over high-dimensional search spaces. In Uncertainty in Artificial Intelligence, pages 507–517. PMLR, 2022.
- Welington De Oliveira. Short paper-a note on the frank–wolfe algorithm for a class of nonconvex and nonsmooth optimization problems. Open Journal of Mathematical Optimization, 4:1–10, 2023.
- Olivier Devolder. Exactness, inexactness and stochasticity in first-order methods for large-scale convex optimization. PhD thesis, ICTEAM and CORE, Université catholique de Louvain, 2013.
- Locally accelerated conditional gradients. In International Conference on Artificial Intelligence and Statistics, pages 1737–1747. PMLR, 2020.
- CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
- High-order optimization methods for fully composite problems. SIAM Journal on Optimization, 32(3):2402–2427, 2022.
- Error bounds, quadratic growth, and linear convergence of proximal methods. Mathematics of Operations Research, 43(3):919–948, 2018.
- Efficiency of minimizing compositions of convex functions and smooth maps. Mathematical Programming, 178(1):503–558, 2019.
- Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Information and Inference: A Journal of the IMA, 8(3):471–529, 2019.
- An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2):95–110, 1956.
- A linearly convergent variant of the conditional gradient algorithm under strong convexity, with applications to online and stochastic optimization. SIAM Journal on Optimization, 26(3):1493–1528, 2016.
- Donald W. Hearn. The gap function of a convex program. Operations Research Letters, 1(2):67–71, apr 1982. 10.1016/0167-6377(82)90049-9.
- Martin Jaggi. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In International Conference on Machine Learning, pages 427–435, 2013.
- On a frank-wolfe approach for abs-smooth functions. arXiv preprint arXiv:2303.09881, 2023.
- Simon Lacoste-Julien. Convergence rate of Frank-Wolfe for non-convex objectives. arXiv preprint arXiv:1607.00345, 2016.
- Guanghui Lan. The complexity of large-scale convex programming under a linear optimization oracle. arXiv preprint arXiv:1309.5550, 2013.
- Guanghui Lan and Yi Zhou. Conditional gradient sliding for convex optimization. SIAM Journal on Optimization, 26(2):1379–1409, 2016.
- Claude Lemaréchal. Cauchy and the gradient method. Doc Math Extra, 251(254):10, 2012.
- Francesco Mezzadri. How to generate random matrices from the classical compact groups. arXiv preprint math-ph/0609050, 2006.
- Kaisa Miettinen. Nonlinear multiobjective optimization, volume 12. Springer Science & Business Media, 1999.
- Arkadi Nemirovski. Information-based complexity of convex programming. Lecture notes, 834, 1995.
- Problem complexity and method efficiency in optimization. 1983.
- Yurii Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2). In Dokl. akad. nauk Sssr, volume 269, pages 543–547, 1983.
- Yurii Nesterov. Effective methods in nonlinear programming. Moscow, Radio i Svyaz, 1989.
- Yurii Nesterov. Modified Gauss–Newton scheme with worst case guarantees for global performance. Optimisation Methods and Software, 22(3):469–483, 2007.
- Yurii Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140(1):125–161, 2013.
- Yurii Nesterov. Complexity bounds for primal-dual methods minimizing the model of objective function. Mathematical Programming, 171(1):311–330, 2018a.
- Yurii Nesterov. Lectures on convex optimization, volume 137. Springer, 2018b.
- Interior-point polynomial algorithms in convex programming. SIAM, 1994.
- Teemu Pennanen. Graph-convex mappings and k-convex functions. Journal of Convex Analysis, 6(2):235–266, 1999.
- Non-convex conditional gradient sliding. In international conference on machine learning, pages 4208–4217. PMLR, 2018.
- A deterministic nonsmooth frank wolfe algorithm with coreset guarantees. Informs Journal on Optimization, 1(2):120–142, 2019.
- R Tyrrell Rockafellar. Convex analysis, volume 36. Princeton university press, 1970.
- Minimization methods for non-differentiable functions, 1985.
- Projection efficient subgradient method and optimal nonsmooth frank-wolfe method. Advances in Neural Information Processing Systems, 33:12211–12224, 2020.
- Stochastic Gauss-Newton algorithms for nonconvex compositional optimization. In International Conference on Machine Learning, pages 9572–9582. PMLR, 2020.
- A conditional gradient framework for composite convex minimization with applications to semidefinite programming. In International Conference on Machine Learning, pages 5727–5736. PMLR, 2018.
- Conditional gradient methods via stochastic path-integrated differential estimator. In International Conference on Machine Learning, pages 7282–7291. PMLR, 2019.
- Random hypervolume scalarizations for provable multi-objective black box optimization. In International Conference on Machine Learning, pages 11096–11105. PMLR, 2020.
- Analysis of the frank–wolfe method for convex composite optimization involving a logarithmically-homogeneous barrier. Mathematical Programming, pages 1–41, 2022.