Dice Question Streamline Icon: https://streamlinehq.com

Role of data structure in resampling performance for high-dimensional regression

Investigate whether and how structural properties of the training data influence the accuracy and consistency of resampling-based bias and variance estimates—specifically pair bootstrap, residual bootstrap, subsampling, and jackknife—in high-dimensional regularized generalized linear models, determining cases where such structure helps or hinders these methods.

Information Square Streamline Icon: https://streamlinehq.com

Background

Throughout the paper, the analysis assumes i.i.d. Gaussian covariates and emphasizes how resampling methods behave under this design in the proportional high-dimensional limit. The authors find notable regimes where resampling methods under- or over-estimate uncertainty, and where regularization mitigates issues.

The conclusion explicitly asks whether structural features in the data might alter these behaviors, suggesting a systematic investigation of non-Gaussian designs, correlations, or other dependencies in covariates, and their impact on the reliability of resampling-based uncertainty quantification in high dimensions.

References

Avenues for future work are manifold. For instance, how would our results change in a misspecified scenario? Can structure in the data help or hinder resampling methods? These interesting questions are left for future investigation.

Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression (2402.13622 - Clarté et al., 21 Feb 2024) in Conclusion and Perspectives