Computational-Statistical Gaps for Improper Learning in Sparse Linear Regression (2402.14103v2)

Published 21 Feb 2024 in cs.LG, cs.CC, math.ST, stat.ML, and stat.TH

Abstract: We study computational-statistical gaps for improper learning in sparse linear regression. More specifically, given $n$ samples from a $k$-sparse linear model in dimension $d$, we ask what is the minimum sample complexity to efficiently (in time polynomial in $d$, $k$, and $n$) find a potentially dense estimate for the regression vector that achieves non-trivial prediction error on the $n$ samples. Information-theoretically this can be achieved using $\Theta(k \log (d/k))$ samples. Yet, despite its prominence in the literature, there is no polynomial-time algorithm known to achieve the same guarantees using less than $\Theta(d)$ samples without additional restrictions on the model. Similarly, existing hardness results are either restricted to the proper setting, in which the estimate must be sparse as well, or only apply to specific algorithms. We give evidence that efficient algorithms for this task require at least (roughly) $\Omega(k^2)$ samples. In particular, we show that an improper learning algorithm for sparse linear regression can be used to solve sparse PCA problems (with a negative spike) in their Wishart form, in regimes in which efficient algorithms are widely believed to require at least $\Omega(k^2)$ samples. We complement our reduction with low-degree and statistical query lower bounds for the sparse PCA problems from which we reduce. Our hardness results apply to the (correlated) random design setting in which the covariates are drawn i.i.d. from a mean-zero Gaussian distribution with unknown covariance.

Citations (2)

View on Semantic Scholar

Summary

The paper establishes that computationally efficient improper learning methods for sparse linear regression require nearly Ω(k²) samples, contrasting with the optimal information-theoretical Θ(k log(d/k)) bound.
It uses a reduction from sparse PCA to reveal that polynomial-time algorithms incur substantially higher sample complexity in high-dimensional, correlated random design settings.
The findings underscore the need for innovative algorithmic strategies to bridge the gap between statistical optimality and computational feasibility in high-dimensional data analysis.

Exploring the Computational-Statistical Gaps in Sparse Linear Regression

Introduction

In the domain of machine learning, sparse linear regression models have garnered significant attention due to their relevance in various applications where the true model is believed to be sparse. Such models are particularly notable in scenarios characterized by high dimensionality and sparsity, where the goal is to predict an output variable as a linear combination of a small subset of the predictor variables. This paper explores the computational-statistical gaps associated with improper learning in sparse linear regression, focusing on the (correlated) random design setting.

Computational-Statistical Gaps

At the heart of this analysis is the exploration of sample complexity—the minimum number of samples required to achieve non-trivial prediction error—while considering computational efficiency. Information-theoretically, achieving prediction error with a minimal number of samples, specifically Θ(k log(d/k)), is well-understood; however, the computational complexity of achieving this with efficient (polynomial-time) algorithms remains less clear, particularly without additional model restrictions. Existing algorithms operating under polynomial time constraints typically require a substantially higher sample complexity, often Ω(d), illuminating a gap between what is computationally feasible and statistically optimal.

Hardness in Improper Learning

This work puts forward evidence that suggests a lower bound on sample complexity for efficient algorithms in improper learning settings of sparse linear regression. It demonstrates that achieving prediction error akin to the information-theoretical minimum likely necessitates Ω(k²⁾ samples—a significant departure from the optimistic k log(d/k) bound, revealing a profound computational-statistical gap. This assertion is substantiated through a reduction from sparse PCA problems with a negative spike, which are widely regarded as computationally intractable under certain sample regimes.

Theoretical Implications and Practical Relevance

The findings present a nuanced understanding of the limitations inherent in current algorithmic approaches for sparse linear regression under improper learning settings. From a theoretical perspective, they align with the broader narrative within statistical learning that identifies stark differences in the sample complexities necessary for statistical consistency versus those required for computational feasibility. Practically, these results serve as a critical consideration for researchers and practitioners working on high-dimensional data, advising caution against over-reliance on computational shortcuts which may not provide statistically robust estimates.

Future Directions

The conjectured computational-statistical gap opens several avenues for future research. A pivotal question is whether novel algorithmic frameworks or learning regimes could narrow this gap. Furthermore, alternative models that go beyond the Gaussian assumptions or explore structured sparsity could offer new insights, potentially leading to more efficient algorithms that do not compromise on the optimal sample complexity. Lastly, a deeper exploration into the nature of improper learning, in different statistical models, could unearth broader principles governing the interplay between computational efficiency and statistical rigour.

Conclusion

By articulating the apparent computational limitations in achieving statistically optimal sample complexity for sparse linear regression in high-dimensional settings, this paper adds a critical dimension to the discourse on efficient learning algorithms. It underscores the necessity for the continued development of algorithmic techniques that better bridge the computational-statistical divide, an endeavor that remains paramount in the advancement of machine learning and data science.

PDF Markdown

Related Papers

Tweets

https://twitter.com/rares_buhai/status/1761006232876302442