Best Subset Selection via a Modern Optimization Lens (1507.03133v1)

Published 11 Jul 2015 in stat.ME, math.OC, stat.CO, and stat.ML

Abstract: In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving Mixed Integer Optimization (MIO) problems. We present a MIO approach for solving the classical best subset selection problem of choosing $k$ out of $p$ features in linear regression given $n$ observations. We develop a discrete extension of modern first order continuous optimization methods to find high quality feasible solutions that we use as warm starts to a MIO solver that finds provably optimal solutions. The resulting algorithm (a) provides a solution with a guarantee on its suboptimality even if we terminate the algorithm early, (b) can accommodate side constraints on the coefficients of the linear regression and (c) extends to finding best subset solutions for the least absolute deviation loss function. Using a wide variety of synthetic and real datasets, we demonstrate that our approach solves problems with $n$ in the 1000s and $p$ in the 100s in minutes to provable optimality, and finds near optimal solutions for $n$ in the 100s and $p$ in the 1000s in minutes. We also establish via numerical experiments that the MIO approach performs better than {\texttt {Lasso}} and other popularly used sparse learning procedures, in terms of achieving sparse solutions with good predictive power.

Citations (634)

View on Semantic Scholar

Summary

The paper presents a Mixed Integer Optimization framework that guarantees global optimality for the best subset selection problem.
It leverages discrete first-order methods to reduce computational burden while achieving near-optimal solutions for large-scale datasets.
Extensive experiments demonstrate that the approach attains superior predictive accuracy and model sparsity compared to traditional methods like Lasso.

Essay on Best Subset Selection via a Modern Optimization Lens

The paper "Best Subset Selection via a Modern Optimization Lens" by Dimitris Bertsimas, Angela King, and Rahul Mazumder addresses the classical best subset selection problem in linear regression, formulated as choosing the optimal subset of k features from a pool of p features based on n observations. This problem has traditionally been challenging because it is NP-hard, particularly with larger p, where contemporary combinatorial approaches scale inadequately.

Approach

The authors present a methodology using Mixed Integer Optimization (MIO) complemented by discrete extensions of first-order optimization methods. This hybrid approach leverages algorithmic enhancements in MIO to derive high-quality feasible solutions and to guarantee optimal solutions under suitable computational constraints.

MIO Framework: The MIO framework is applied to the subset selection problem, providing a systematic process to achieve global optimality. The authors formulate the problem utilizing Specially Ordered Sets (SOS-1) that effectively manage the cardinality constraints.
Algorithmic Advancements: The novel discrete first-order methods extend traditional continuous optimization, focused on efficiently achieving near-optimal solutions, thereby reducing the total computational burden when integrated with MIO.

Numerical Experiments and Results

This paper substantiates the efficacy of the proposed methods through extensive testing on a variety of synthetic and real datasets. The experiments demonstrate:

The ability to solve subset selection problems with n in the thousands and p in the hundreds within minutes, attaining provable optimality.
Application to high-dimensional settings, where solutions reach near-optimality quickly, with statistical properties corroborated through experimental analysis.

Perhaps most notably, the MIO approach outperforms Lasso—a commonly employed technique for sparse learning—in terms of selecting sparser and more predictively accurate models.

Statistical Implications

The subset selection achieved by their method shows improved predictive performance due to exactness in variable selection over approximations made by convex relaxations like Lasso. The empirical demonstrations confirm this advantage, offering crucial insights into the potential pitfalls of more conventional approaches under certain conditions, particularly where feature correlations are high or regularity exists in the data matrix.

Future Directions

This paper suggests promising further research pathways:

Investigation into the integration of more advanced side constraints within the MIO framework.
Expanding the discrete optimization framework for other loss functions and regression variants beyond linear models.
Exploring scalable algorithmic variants that maintain statistical robustness across diverse application domains in high-dimensional spaces.

Conclusions

In summary, the authors introduce a robust optimization-based approach to tackle the intractable best subset selection problem, demonstrating significant advancements both computationally and statistically. Their results underscore the power of modern MIO techniques in addressing classical, high-complexity problems, marking an important evolution in optimization methodologies applicable to statistical learning challenges.

PDF Markdown