Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm (2307.01169v1)

Published 3 Jul 2023 in math.OC, cs.LG, and stat.ML

Abstract: We consider minimizing a smooth function subject to a summation constraint over its variables. By exploiting a connection between the greedy 2-coordinate update for this problem and equality-constrained steepest descent in the 1-norm, we give a convergence rate for greedy selection under a proximal Polyak-Lojasiewicz assumption that is faster than random selection and independent of the problem dimension $n$. We then consider minimizing with both a summation constraint and bound constraints, as arises in the support vector machine dual problem. Existing greedy rules for this setting either guarantee trivial progress only or require $O(n2)$ time to compute. We show that bound- and summation-constrained steepest descent in the L1-norm guarantees more progress per iteration than previous rules and can be computed in only $O(n \log n)$ time.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Amir Beck. The 2-coordinate descent method for solving double-sided simplex constrained minimization problems. J. Optim. Theory Appl., 162:892–919, 2014.
  2. Dimitri P Bertsekas. Network optimization: continuous and discrete models. Athena Scientific, 1998.
  3. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3):27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  4. Efficient projections onto the l 1-ball for learning in high dimensions. In International Conference on Machine Learning, pages 272–279, 2008.
  5. Working set selection using second order information for training support vector machines. J. Mach. Learn. Res., 6, 2005.
  6. Faster convergence of a randomized coordinate descent method for linearly constrained optimization problems. Analysis and Applications, 16:741–755, 2018.
  7. A comparative study on large scale kernelized support vector machines. Adv. Data Anal. Classi., 12:867–883, 2018.
  8. Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In Joint European conference on machine learning and knowledge discovery in databases, pages 795–811, 2016.
  9. Efficient greedy coordinate descent for composite problems. In International Conference on Artificial Intelligence and Statistics, pages 2887–2896, 2019.
  10. Adaptive balancing of gradient and update computation times using global geometry and approximate subproblems. In International Conference on Artificial Intelligence and Statistics, pages 1204–1213, 2018.
  11. Improvements to platt’s smo algorithm for svm classifier design. Neural Computation, 13:637–649, 2001.
  12. On the global linear convergence of frank-wolfe optimization variants. Advances in Neural Information Processing Systems, 28, 2015.
  13. I. Necoara and A. Patrascu. A random coordinate descent algorithm for optimization problems with composite objection function and linear coupled constraints. Comput. Optim. Appl., pages 307–337, 2014.
  14. A random coordinate descent method on large optimization problems with linear constraints. Technical Report, University Politehnica Bucharest, 2011.
  15. Random block coordinate descent methods for linearly constrained optimization over networks. J. Optim. Theory Appl., 173:227–254, 2017.
  16. Y. Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim., 22:341–362, 2012.
  17. Coordinate descent converges faster with the Gauss-Southwell rule than random selection. In International Conference on Machine Learning, pages 1632–1641, 2015.
  18. Julie Nutini. Greed is good: greedy optimization methods for large-scale structured problems. PhD thesis, University of British Columbia, 2018.
  19. Let’s make block coordinate descent converge faster: faster greedy rules, message-passing, active-set complexity, and superlinear convergence. J. Mach. Learn. Res., 23:1–74, 2022.
  20. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program., 144:1–38, 2014.
  21. Linear convergence and support vector identification of sequential minimal optimization. In NeurIPS Workshop on Optimization for Machine Learning, volume 5, page 50, 2017.
  22. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, 19:2246–2253, 2003.
  23. Accelerated stochastic greedy coordinate descent by soft thresholding projection onto simplex. Advances in Neural Information Processing Systems, 2017.
  24. Are we there yet? manifold identification of gradient-related proximal methods. In International Conference on Artificial Intelligence and Statistics, pages 1110–1119, 2019.
  25. P. Tseng and S. Yun. Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl., pages 513–535, 2009.
  26. Group sparsity via linear-time projection, 2008. Tech. Rep. TR-2008-09, Dept. of Computer Science, UBC.
  27. Iteration complexity of feasible descent methods for convex optimization. J. Mach. Learn. Res., 15:1523–1548, 2014.
Citations (1)

Summary

We haven't generated a summary for this paper yet.