Bootstrap Methods for Conditional Coverage
- Bootstrapping for conditional coverage is a methodology that uses resampling procedures to adjust confidence intervals following data-driven selection, ensuring target frequentist coverage.
- It addresses the shortcomings of standard marginal intervals by providing rank-specific adjustments that mitigate issues like the winner's curse in high-dimensional settings.
- Recent bootstrap algorithms offer shorter, less conservative intervals with efficient computation for both selective inference and predictive interval recalibration.
Bootstrapping for @@@@1@@@@ refers to the use of bootstrap-based procedures to construct confidence or prediction intervals that achieve target frequentist coverage, not only on average (unconditionally), but also after data-driven selection events or conditioning on particular features—such as ranks or observed data—of interest. The motivation arises from the observation that standard interval construction procedures often fail to provide the desired coverage in settings that involve selection and high-dimensional inference, where the probability that a reported interval covers the true parameter, given the observed data or selection, can deviate substantially from the nominal level. Recent developments provide formal algorithms for utilizing bootstrap methods to restore or approximate conditional coverage guarantees in complex selection and ranking regimes, often with a focus on intervals that remain short and less conservative than traditional adjustments.
1. Conditional Coverage and Its Failures in Standard Inference
Conditional coverage is the probability that an interval covers the target parameter given the observed data, selection event, or ranking, rather than in an unconditional, marginal sense. In high-dimensional problems where many parameters are estimated and only the most significant are reported, standard (marginal) confidence intervals for a parameter ,
fail to account for selection effects. This yields intervals that dramatically under-cover parameters associated with the most extreme estimates (the "winner's curse") and over-cover those associated with moderate estimates, leading to coverage probabilities that can be near zero for top ranks and near one for middle ranks, averaging to the nominal level only globally. Such deficiencies persist for many selection-adjusted or FCR-controlling procedures, which often merely inflate all intervals symmetrically without addressing the erratic coverage by rank (Morrison et al., 2017).
2. Rank Conditional Coverage and Its Formalization
Rank Conditional Coverage (RCC) is the coverage criterion that quantifies expected coverage by rank after sorting parameter estimates by some measure of significance. For a ranking permutation , with the most significant, the rank- RCC is defined as
A procedure controls RCC at level if for each rank of practical interest (typically for ). This notion is motivated by the fact that discovery-reporting in genomics, imaging, and high-throughput screening is rank-driven, concentrating inferential attention on the top findings, for which marginal intervals offer misleading coverage properties (Morrison et al., 2017).
3. Bootstrap Algorithms for Achieving Conditional Coverage
3.1. RCC via Bootstrap in High Dimensions
The bootstrap enables estimation of the conditional law of an estimator or its bias distribution at each rank. If the joint distribution for were known, one could simulate bootstrap draws , rank each , and for rank- aggregate all . The empirical quantiles of these yield order-statistic specific bias corrections, forming intervals
that control exactly.
In practice, parametric Bootstrap for RCC uses plug-in estimates or debiased values and simulates
with subsequent sorting and bias estimation as above, ultimately producing intervals
where are the empirical and quantiles of rank- bias draws. For more general settings, nonparametric bootstrap is applied by resampling raw data, recomputing all estimators, reranking, and constructing the corresponding bias distributions (Morrison et al., 2017).
3.2. Bootstrap for Selective and Event-Conditional Inference
Conditional selective inference, where the selection event may have a complex, algorithmic dependence on data, can also be addressed with bootstrapping. One estimates the conditional probability of selection by repeatedly generating bootstrap datasets, running the selection algorithm, and fitting a flexible model (e.g., neural net, logistic regression) to approximate the selection probability . The estimated distribution of a test statistic conditional on the selection event is then obtained by tilting the pre-selection (Gaussian) law by this estimated , producing valid confidence intervals and p-values for the selected model or target (Liu et al., 2022).
3.3. Conditional Prediction Intervals via Bootstrap
For conditional prediction intervals in regression, the residual-based bootstrap interval (RB) provides only unconditional coverage. The difference between achieved conditional and nominal coverage is asymptotically normal with mean zero, such that the event of undercoverage occurs with probability $1/2$. A refined “Residual Bootstrap with Unconditional Guarantee” (RBUG) corrects the critical quantile by a small tilt, calibrated using additional bootstrap samples, to guarantee asymptotic conditional coverage and a specific lower bound on the probability that the conditional coverage exceeds the nominal level (Zhang et al., 2020).
4. Theoretical Guarantees
For RCC, provided the bootstrap is consistent for the rank-specific bias distribution (either parametric or nonparametric), the constructed intervals satisfy
Moreover, if for every , then the false coverage-statement rate (FCR) for the set of top selected parameters (under the given ranking) is also controlled at (Morrison et al., 2017).
For bootstrap selective inference, under asymptotic normality, bootstrap consistency, Lipschitz continuity of the conditional selection law, and accurate estimation of the selection probability, inversion of the estimated selection-conditional cumulative distribution function yields intervals whose coverage converges to the nominal level uniformly over large classes of models (Liu et al., 2022).
For prediction intervals, the refined RBUG/PRBUG methods achieve that the random conditional coverage converges in probability to , and the probability that it falls below approaches the prescribed guarantee level (Zhang et al., 2020).
5. Algorithmic and Computational Aspects
Bootstrap RCC intervals are implemented in the R package rcc, which provides both parametric (function par_bs_ci) and nonparametric (nonpar_bs_ci) interfaces. Key arguments include the observed statistics, standard errors, data, number of bootstrap replicates , target coverage, and ranking function. The computational complexity for the parametric bootstrap is per iteration, with practical runtimes of seconds for . The nonparametric bootstrap incurs greater cost, proportional to times the original model-fitting cost (Morrison et al., 2017).
For selective inference, computation is dominated by the need to rerun the selection algorithm times (for ), but this process is embarrassingly parallel. Fitting the selection probability estimator (e.g., logistic regression or neural nets) adds further cost, but is typically manageable for moderate dimension (Liu et al., 2022).
For bootstrapped prediction intervals with conditional guarantees, the standard RB is times the cost of fitting the model, with the guarantee-corrected RBUG/PRBUG requiring additional loops for recalibration with auxiliary bootstrap sizes (Zhang et al., 2020).
6. Empirical Performance and Practical Insights
Empirical evaluations across synthetic and semi-synthetic regimes confirm severe under-coverage at high ranks for marginal and many selective CI methods. For example, in independent Gaussian models with , the nominal marginal CIs under-cover the top ranks at near , while RCC-bootstrapped intervals achieve uniform coverage near for all top 200 ranks, with interval widths $20$– shorter than Bonferroni or FCR-adjusted intervals (Morrison et al., 2017).
Selective inference via black-box bootstrap achieves nominal coverage and reduced length in a variety of settings, including randomized Lasso, knockoff selection, Benjamini–Hochberg screening, and repeated testing. The intervals are generally much shorter and more reliable than naive or sample-splitting alternatives (Liu et al., 2022).
For prediction, standard bootstrapped intervals have only a 50% chance of providing conditional coverage, while the guarantee-corrected RBUG/PRBUG intervals substantially reduce the frequency of undercovered data points (only – of test points under-covered, dependent on the tuned guarantee level), with only a mild increase in average interval length (Zhang et al., 2020).
7. Applications and Scope
Bootstrapping for conditional coverage is directly applicable in high-dimensional regression, genomics, imaging, multiple testing, clinical trials, and any context where parameter selection or ranking is data-dependent. The methodology is robust to diverse correlation structures and sparsity levels and can be adapted to cover regression, prediction, and complex selection regimes, including arbitrary selection events defined by black-box algorithms. The approach is generic and modular, requiring only access to the selection or ranking rule, the estimators, and an estimator for the error or bias distribution, making it widely implementable in contemporary high-dimensional data analysis workflows (Morrison et al., 2017, Liu et al., 2022, Zhang et al., 2020).