Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm (2307.01169v1)
Abstract: We consider minimizing a smooth function subject to a summation constraint over its variables. By exploiting a connection between the greedy 2-coordinate update for this problem and equality-constrained steepest descent in the 1-norm, we give a convergence rate for greedy selection under a proximal Polyak-Lojasiewicz assumption that is faster than random selection and independent of the problem dimension $n$. We then consider minimizing with both a summation constraint and bound constraints, as arises in the support vector machine dual problem. Existing greedy rules for this setting either guarantee trivial progress only or require $O(n2)$ time to compute. We show that bound- and summation-constrained steepest descent in the L1-norm guarantees more progress per iteration than previous rules and can be computed in only $O(n \log n)$ time.
- Amir Beck. The 2-coordinate descent method for solving double-sided simplex constrained minimization problems. J. Optim. Theory Appl., 162:892–919, 2014.
- Dimitri P Bertsekas. Network optimization: continuous and discrete models. Athena Scientific, 1998.
- LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3):27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
- Efficient projections onto the l 1-ball for learning in high dimensions. In International Conference on Machine Learning, pages 272–279, 2008.
- Working set selection using second order information for training support vector machines. J. Mach. Learn. Res., 6, 2005.
- Faster convergence of a randomized coordinate descent method for linearly constrained optimization problems. Analysis and Applications, 16:741–755, 2018.
- A comparative study on large scale kernelized support vector machines. Adv. Data Anal. Classi., 12:867–883, 2018.
- Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In Joint European conference on machine learning and knowledge discovery in databases, pages 795–811, 2016.
- Efficient greedy coordinate descent for composite problems. In International Conference on Artificial Intelligence and Statistics, pages 2887–2896, 2019.
- Adaptive balancing of gradient and update computation times using global geometry and approximate subproblems. In International Conference on Artificial Intelligence and Statistics, pages 1204–1213, 2018.
- Improvements to platt’s smo algorithm for svm classifier design. Neural Computation, 13:637–649, 2001.
- On the global linear convergence of frank-wolfe optimization variants. Advances in Neural Information Processing Systems, 28, 2015.
- I. Necoara and A. Patrascu. A random coordinate descent algorithm for optimization problems with composite objection function and linear coupled constraints. Comput. Optim. Appl., pages 307–337, 2014.
- A random coordinate descent method on large optimization problems with linear constraints. Technical Report, University Politehnica Bucharest, 2011.
- Random block coordinate descent methods for linearly constrained optimization over networks. J. Optim. Theory Appl., 173:227–254, 2017.
- Y. Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim., 22:341–362, 2012.
- Coordinate descent converges faster with the Gauss-Southwell rule than random selection. In International Conference on Machine Learning, pages 1632–1641, 2015.
- Julie Nutini. Greed is good: greedy optimization methods for large-scale structured problems. PhD thesis, University of British Columbia, 2018.
- Let’s make block coordinate descent converge faster: faster greedy rules, message-passing, active-set complexity, and superlinear convergence. J. Mach. Learn. Res., 23:1–74, 2022.
- Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program., 144:1–38, 2014.
- Linear convergence and support vector identification of sequential minimal optimization. In NeurIPS Workshop on Optimization for Machine Learning, volume 5, page 50, 2017.
- A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, 19:2246–2253, 2003.
- Accelerated stochastic greedy coordinate descent by soft thresholding projection onto simplex. Advances in Neural Information Processing Systems, 2017.
- Are we there yet? manifold identification of gradient-related proximal methods. In International Conference on Artificial Intelligence and Statistics, pages 1110–1119, 2019.
- P. Tseng and S. Yun. Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl., pages 513–535, 2009.
- Group sparsity via linear-time projection, 2008. Tech. Rep. TR-2008-09, Dept. of Computer Science, UBC.
- Iteration complexity of feasible descent methods for convex optimization. J. Mach. Learn. Res., 15:1523–1548, 2014.