Dice Question Streamline Icon: https://streamlinehq.com

Misspecification effects on resampling in high-dimensional GLMs

Determine how the asymptotic characterization and the conclusions about bias and variance estimates obtained via pair bootstrap, residual bootstrap, subsampling, and jackknife change under model misspecification in high-dimensional regularized generalized linear models, i.e., when the data-generating process does not match the assumed well-specified Gaussian covariate design and likelihood.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper provides a rigorous asymptotic analysis of resampling methods—pair bootstrap, residual bootstrap, subsampling, and jackknife—for estimating bias and variance in high-dimensional regularized regression and classification, focusing on generalized linear models with Gaussian covariates and well-specified likelihoods. The main results characterize overlaps via state evolution, and detail where resampling methods succeed or fail depending on the sample-to-dimension ratio and regularization.

In the conclusion, the authors explicitly raise the question of misspecification, asking how their findings would change when the underlying statistical model is not correctly specified. This points to extending the analysis beyond the well-specified Gaussian design and log-concave likelihood setting and assessing robustness of resampling-based uncertainty quantification to deviations from these assumptions.

References

Avenues for future work are manifold. For instance, how would our results change in a misspecified scenario? Can structure in the data help or hinder resampling methods? These interesting questions are left for future investigation.

Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression (2402.13622 - Clarté et al., 21 Feb 2024) in Conclusion and Perspectives