Role of data structure in resampling performance for high-dimensional regression

Investigate whether and how structural properties of the training data influence the accuracy and consistency of resampling-based bias and variance estimates—specifically pair bootstrap, residual bootstrap, subsampling, and jackknife—in high-dimensional regularized generalized linear models, determining cases where such structure helps or hinders these methods.

Background

Throughout the study, the analysis assumes i.i.d. Gaussian covariates and emphasizes how resampling methods behave under this design in the proportional high-dimensional limit. The authors find notable regimes where resampling methods under- or over-estimate uncertainty, and where regularization mitigates issues.

The conclusion explicitly asks whether structural features in the data might alter these behaviors, suggesting a systematic investigation of non-Gaussian designs, correlations, or other dependencies in covariates, and their impact on the reliability of resampling-based uncertainty quantification in high dimensions.

References

Avenues for future work are manifold. For instance, how would our results change in a misspecified scenario? Can structure in the data help or hinder resampling methods? These interesting questions are left for future investigation.

Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression  (2402.13622 - Clarté et al., 2024) in Conclusion and Perspectives