Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dynamic Incremental Optimization for Best Subset Selection (2402.02322v5)

Published 4 Feb 2024 in cs.LG and stat.ML

Abstract: Best subset selection is considered the `gold standard' for many sparse learning problems. A variety of optimization techniques have been proposed to attack this non-smooth non-convex problem. In this paper, we investigate the dual forms of a family of $\ell_0$-regularized problems. An efficient primal-dual algorithm is developed based on the primal and dual problem structures. By leveraging the dual range estimation along with the incremental strategy, our algorithm potentially reduces redundant computation and improves the solutions of best subset selection. Theoretical analysis and experiments on synthetic and real-world datasets validate the efficiency and statistical properties of the proposed solutions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Safe screening rules for l0-regression from perspective relaxations. In Proceedings of the 37th International Conference on Machine Learning (ICML), pages 421–430, Virtual Event, 2020.
  2. Best subset selection via a modern optimization lens. The Annals of Statistics, 44(2):813–852, 2016.
  3. Sparse high-dimensional regression: Exact scalable algorithms and phase transitions. The Annals of Statistics, 48(1):300–323, 2020.
  4. A smoothing proximal gradient algorithm for nonsmooth convex regression with cardinality penalty. SIAM J. Numer. Anal., 58(1):858–883, 2020.
  5. Iterative hard thresholding for compressed sensing. Applied and Computational Harmonic Analysis, 27(3):265–274, 2009.
  6. Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. The Annals of Applied Statistics, 5(1):232, 2011.
  7. Learning sparse classifiers: Continuous and mixed integer optimization perspectives. J. Mach. Learn. Res., 22:135:1–135:47, 2021.
  8. Regularization vs. relaxation: A conic optimization perspective of statistical variable selection. arXiv preprint arXiv:1510.06083, 2015.
  9. Werner Fenchel. On conjugate convex functions. Canadian Journal of Mathematics, 1(1):73–77, 1949.
  10. Mind the duality gap: safer rules for the lasso. In Proceedings of the 32nd International Conference on Machine Learning (ICML), pages 333–342, Lille, France, 2015.
  11. Simon Foucart. Hard thresholding pursuit: An algorithm for compressive sensing. SIAM J. Numer. Anal., 49(6):2543–2563, 2011.
  12. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1, 2010.
  13. Regressions by leaps and bounds. Technometrics, 16(4):499–511, 1974.
  14. High dimensional regression with binary coefficients. estimating squared error and a phase transtition. In Proceedings of the 30th Conference on Learning Theory (COLT), pages 948–953, Amsterdam, The Netherlands, 2017.
  15. Extended comparisons of best subset selection, forward stepwise selection, and the lasso. arXiv:1707.08692, 2017.
  16. Fast best subset selection: Coordinate descent and local combinatorial optimization algorithms. Oper. Res., 68(5):1517–1537, 2020.
  17. Sparse regression at scale: Branch-and-bound rooted in first-order optimization. arXiv preprint arXiv:2004.06152, 2020.
  18. Structured sparse regression via greedy hard thresholding. In Advances in Neural Information Processing Systems (NeurIPS), pages 1516–1524, Barcelona, Spain, 2016.
  19. On iterative hard thresholding methods for high-dimensional m-estimation. In Advances in Neural Information Processing Systems (NIPS), pages 685–693, Montreal, Canada, 2014.
  20. Dual iterative hard thresholding: From non-convex sparse minimization to non-smooth concave maximization. In Proceedings of the 34th International Conference on Machine Learning (ICML), pages 2179–2187, Sydney, Australia, 2017.
  21. Variable selection via a combination of the l0subscript𝑙0l_{0}italic_l start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and l1subscript𝑙1l_{1}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT penalties. Journal of Computational and Graphical Statistics, 16(4):782–798, 2007.
  22. Support recovery without incoherence: A case for nonconvex regularization. The Annals of Statistics, 45(6):2455–2482, 2017.
  23. Celer: a fast solver for the lasso with dual extrapolation. In Proceedings of the 35th International Conference on Machine Learning (ICML), pages 3321–3330, Stockholmsmässan, Stockholm, Sweden, 2018.
  24. SparseNet: Coordinate descent with nonconvex penalties. Journal of the American Statistical Association, 106(495):1125–1138, 2011.
  25. The discrete dantzig selector: Estimating sparse linear models via mixed integer linear optimization. IEEE Trans. Inf. Theory, 63(5):3053–3075, 2017.
  26. Subset selection with shrinkage: Sparse linear modeling when the SNR is low. Oper. Res., 2022.
  27. GAP safe screening rules for sparse multi-task and multi-class models. In Advances in Neural Information Processing Systems (NIPS), pages 811–819, Montreal, Canada, 2015.
  28. Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res., 18:128:1–128:33, 2017.
  29. Yurii E. Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim., 22(2):341–362, 2012.
  30. Proximal algorithms. Found. Trends Optim., 1(3):127–239, 2014.
  31. Sparse learning via boolean relaxations. Math. Program., 151(1):63–87, 2015.
  32. Thunder: a fast coordinate selection solver for sparse learning. In Advances in Neural Information Processing Systems (NeurIPS), virtual, 2020.
  33. A tight bound of hard thresholding. J. Mach. Learn. Res., 18:208:1–208:42, 2017.
  34. A unified view of exact continuous penalties for ℓ2−ℓ0subscriptℓ2subscriptℓ0\ell_{2}-\ell_{0}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - roman_ℓ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT minimization. SIAM J. Optim., 27(3):2034–2060, 2017.
  35. Homotopy based algorithms for ℓ0subscriptℓ0\ell_{0}roman_ℓ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT-regularized least-squares. IEEE Trans. Signal Process., 63(13):3301–3316, 2015.
  36. Fast proximal gradient descent for A class of non-convex and non-smooth sparse learning problems. In Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI), pages 1253–1262, Tel Aviv, Israel, 2019.
  37. Nearly non-expansive bounds for mahalanobis hard thresholding. In Proceedings of Conference on Learning Theory (COLT), pages 3787–3813, Virtual Event [Graz, Austria], 2020.
  38. Gradient hard thresholding pursuit for sparsity-constrained optimization. In Proceedings of the 31th International Conference on Machine Learning (ICML), pages 127–135, Beijing, China, 2014.
  39. Dual iterative hard thresholding. J. Mach. Learn. Res., 21:152–1, 2020.
  40. A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences, 117(52):33117–33123, 2020.

Summary

We haven't generated a summary for this paper yet.