Rank Conditional Coverage (RCC)
- Rank Conditional Coverage (RCC) is defined as the probability that the true parameter lies within a constructed confidence interval given its empirical rank.
- RCC addresses the under-coverage of conventional marginal intervals by building rank-specific intervals that maintain nominal coverage for extreme estimates.
- Bootstrap methods, both parametric and non-parametric, are used to estimate bias distributions, making RCC robust for high-dimensional inference and predictive applications.
Rank Conditional Coverage (RCC) is a statistical framework for evaluating and constructing confidence intervals in the context of large-scale inference, with a particular focus on the coverage properties conditional on the empirical ranking of parameter estimates or conformity scores. RCC provides an explicit answer to the well-documented failure of marginal confidence intervals to maintain nominal coverage rates at the ranks of most scientific interest—that is, for the most extreme or “significant” estimates. Recent developments extend RCC to predictive set construction via rectified conformal prediction. The RCC concept addresses high-dimensional problems in which multiple parameters, tests, or predictions must be jointly analyzed, and selection or reporting bias poses a major challenge to reliable inference (Morrison et al., 2017, Plassier et al., 22 Feb 2025).
1. Formal Definition of Rank Conditional Coverage
Let be parameters of interest, with point estimates . Estimates are ranked by significance (e.g., by absolute -statistic), with denoting the index of the -th most significant parameter so that . The Rank Conditional Coverage at rank is
where is the confidence set or interval for the -th ranked estimate. RCC can be equivalently expressed as
RCC(i) thus gives the expected coverage rate specifically at rank over repeated sampling (Morrison et al., 2017).
2. Motivations: Marginal Coverage Failure and the Superiority of RCC
Conventional marginal confidence intervals are designed so that
but this marginal guarantee masks a pronounced under-coverage for the most extreme (top-ranked) estimates and over-coverage for typical or median ones. When scientific or reporting interest is focused on the top estimates—e.g., in biomarker discovery or variable selection—the realized coverage among those selected ranks may fall well below .
Selection-adjusted procedures (e.g., False Coverage-Statement Rate (FCR) control) address average coverage among selected parameters, typically by inflating all intervals, but still produce substantial undercoverage among the most extreme ranks. By shifting the criterion to RCC(i) for all , procedures can guarantee, rank-by-rank, that the observed coverage matches the nominal level (Morrison et al., 2017).
3. Construction of RCC-Controlled Intervals
The central methodological innovation behind RCC is to build intervals that achieve asymptotic coverage at each rank. This is operationalized via the estimation of the rank-specific bias distribution:
Denoting its cumulative distribution by , the (oracle) RCC-exact interval is
so that for every in finite samples.
Since is unknown, it is estimated by bootstrap (parametric or non-parametric):
- Parametric bootstrap: If or the independently, simulate , rerank, and compute . Use empirical quantiles for interval endpoints.
- Non-parametric bootstrap: Generate bootstrap datasets, re-estimate , rerank, and compute .
This bootstrap approach yields intervals of the form
which, under standard regularity (consistent bootstrap law), asymptotically achieve RCC(i) simultaneously for all (Morrison et al., 2017).
4. Theoretical Properties and Implications
Oracle RCC intervals satisfy in finite samples. Boostrap-based intervals achieve this property asymptotically uniformly over , i.e.,
as the number of bootstrap samples and . An important corollary is that any procedure reporting the top estimates will have overall FCR provided RCC(i) for all , making RCC control a pointwise strengthening of FCR control.
Simulation studies further demonstrate that RCC intervals uniformly maintain target coverage at all ranks and outperform both marginal and FCR-adjusted intervals, especially at extremes where miscoverage is most severe (Morrison et al., 2017).
5. RCC in Predictive Inference and Rectified Conformal Prediction
In predictive inference, the concept of RCC has been adopted to address analogous failures of conditional coverage in conformal prediction frameworks. Classical split conformal prediction guarantees marginal coverage , but may provide sub-nominal coverage over subsets defined by the rank of conformity scores.
Recent work introduces an explicit score-rectification mechanism: via regression, estimate the conditional -quantile of conformity scores , and transform raw scores as for a monotonic family . Applying ordinary split conformal prediction to these rectified scores ensures coverage that is nearly uniform both marginally and over strata defined by the empirical rank of test conformity scores—i.e., RCC (Plassier et al., 22 Feb 2025).
Theoretical bounds confirm that the resulting coverage conditional on covariates, and thus conditional on rank strata, approaches provided the quantile regression is accurate. Empirical studies in multi-output prediction highlight that RCC-conformal methods reduce the maximal conditional coverage error compared to non-RCC approaches (Plassier et al., 22 Feb 2025).
6. Software Implementations and Illustrative Examples
The R package “rcc” implements both parametric and non-parametric bootstrap methods for RCC interval construction. The package provides utilities for ranking by signed or absolute test statistics and outputs rank-ordered intervals. Basic usage includes:
1 2 |
ci_par <- par_bs_ci(est = theta.hat, se = se.hat, level = 0.90, nboot = 1000) ci_np <- nonpar_bs_ci(data = data, estFUN = estFUN, level = 0.90, nboot = 500) |
7. Practical Considerations, Extensions, and Current Research
A practical guideline for achieving RCC in predictive inference is to choose a meaningful conformity score , split calibration data for quantile regression, estimate the conditional quantile , apply the rectification transformation (additive or multiplicative), and run split conformal prediction on rectified scores. This recipe is model- and score-agnostic, and applies equally to multi-output and structured prediction tasks, provided a scalar conformity score can be evaluated (Plassier et al., 22 Feb 2025).
RCC has been shown to imply stronger guarantees than FCR for selection procedures, aligns coverage properties with scientific usage (publication of top-ranked findings), and is extensible to nonparametric, correlated, and structured inference settings. Ongoing investigations include the statistical and computational tradeoffs associated with complex quantile regression estimators for rectification, and the precise characterization of RCC in dependent or high-dimensional settings.