Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso (1707.08692v2)

Published 27 Jul 2017 in stat.ME and stat.CO

Abstract: In exciting new work, Bertsimas et al. (2016) showed that the classical best subset selection problem in regression modeling can be formulated as a mixed integer optimization (MIO) problem. Using recent advances in MIO algorithms, they demonstrated that best subset selection can now be solved at much larger problem sizes that what was thought possible in the statistics community. They presented empirical comparisons of best subset selection with other popular variable selection procedures, in particular, the lasso and forward stepwise selection. Surprisingly (to us), their simulations suggested that best subset selection consistently outperformed both methods in terms of prediction accuracy. Here we present an expanded set of simulations to shed more light on these comparisons. The summary is roughly as follows: (a) neither best subset selection nor the lasso uniformly dominate the other, with best subset selection generally performing better in high signal-to-noise (SNR) ratio regimes, and the lasso better in low SNR regimes; (b) best subset selection and forward stepwise perform quite similarly throughout; (c) the relaxed lasso (actually, a simplified version of the original relaxed estimator defined in Meinshausen, 2007) is the overall winner, performing just about as well as the lasso in low SNR scenarios, and as well as best subset selection in high SNR scenarios.

Citations (196)

Summary

  • The paper comprehensively compares Best Subset Selection, Forward Stepwise Selection, and Lasso methods for variable selection across various scenarios, evaluating their predictive performance.
  • It finds no universal dominance between Best Subset Selection and Lasso, with Best Subset performing better in high SNR and Lasso in low SNR conditions, while Forward Stepwise shows surprising similarity to Best Subset.
  • The study highlights the Relaxed Lasso as a versatile approach that adapts well across different SNR levels, effectively combining strengths of Best Subset and Lasso.

Analysis of Variable Selection Methods in Regression

This paper presents a comprehensive evaluation of several popular methods for variable selection in linear regression models: Best Subset Selection, Forward Stepwise Selection, and the Lasso, alongside a simplified version of the Relaxed Lasso. The motivation for this investigation stems from advances in optimization algorithms, particularly those offered by recent formulations of Best Subset Selection as a Mixed Integer Optimization (MIO) problem, which have made it feasible to solve larger instances than previously possible.

The paper's core contribution is a detailed comparison of these methods across various scenarios differing in signal-to-noise ratio (SNR), dimensionality, and sparsity of the true model coefficients. The analysis highlights three key findings:

  1. No Clear Dominance: Neither Best Subset Selection nor the Lasso universally outperforms the other. Best Subset Selection shows superior predictive performance in high SNR circumstances, while the Lasso is preferable in low SNR scenarios due to its bias-variance trade-off properties.
  2. Similarity Between Best Subset and Forward Stepwise Selection: The paper finds that Forward Stepwise Selection and Best Subset Selection yield almost similar performance across most scenarios, which contrasts with prior findings suggesting substantial differences.
  3. Relaxed Lasso's Robust Performance: The Relaxed Lasso emerges as the most versatile approach, combining the strengths of both Best Subset and Lasso under varying conditions. It adapts well across different SNR levels by appropriately tuning its shrinkage parameter.

The paper's empirical phase involves simulations across diverse settings, emulating realistic regression environments with varying degrees of predictor correlation, coefficient sparsity, and SNR. These setups allow for the examination of model performance relative to oracle estimates and across different measures of predictive accuracy, including Relative Risk, Relative Test Error, and Proportion of Variance Explained (PVE).

Key Methodological Approaches

  • Best Subset Selection is explored through an MIO framework utilizing Gurobi solver, enabling the evaluation of large-scale instances despite its NP-hard complexity. The efficiency of this method is noted to be significantly improved compared to traditional branch-and-bound techniques.
  • Forward Stepwise Selection maintains its relevance by showing strong performance and computational efficiency due to its stepwise inclusion of variables.
  • The Lasso, a convex relaxation of the subset selection problem, is known for its computational tractability and effectiveness in scenarios with high-dimensional data where traditional methods falter.
  • Relaxed Lasso enhances Lasso by mitigating its inherent shrinkage, providing a balance between aggression in variable selection and stability in coefficient estimation.

Computational Considerations

While the computations for the Lasso and Relaxed Lasso are comparably efficient, the MIO-based Best Subset Selection demands significantly higher computational resources, particularly when the subset size or problem dimensionality increases. The implementation utilized a time constraint of three minutes per subset size, affecting the ability to guarantee optimal solutions across all instances.

Implications and Future Directions

The findings underscore the importance of context in selecting an appropriate variable selection method. They suggest that while advancements in optimization have made traditional combinatorial approaches more feasible, they do not outright replace methods like the Lasso for practical applications, especially in high-dimensional settings with low SNR. The Relaxed Lasso's performance positions it as a flexible and powerful tool in many real-world applications.

Future research could focus on exploring hybrid methods that incorporate advantages of multiple strategies, further enhancements in computational efficiency for MIO formulations, and investigation into other estimation metrics such as those focused on variable recovery. Additionally, real-world applications extending beyond simulations could provide further insights into method efficacy across different domains. The paper's R package, bestsubset, sets a foundation for such explorations, enabling reproducibility and extended simulation studies.