Reweighted Conformal Prediction Procedure
- The paper introduces a method that guarantees both marginal and mask-conditional coverage by adjusting prediction intervals after imputation.
- It employs a preimpute–mask–then–correct pipeline using importance weighting or acceptance-rejection to handle various missing data mechanisms like MCAR, MAR, and MNAR.
- Empirical evaluations demonstrate up to a 30% reduction in prediction interval width while maintaining reliable coverage across different datasets.
A reweighted conformal prediction procedure is a statistical method designed to provide reliable uncertainty quantification when features (covariates) in the dataset are subject to missingness. The foundational challenge it addresses is that classical conformal prediction (CP) does not guarantee valid coverage in the presence of missing covariates, especially when coverage must be controlled not only marginally, but for each possible missingness pattern (known as mask-conditional coverage, MCV). The reweighted conformal approach provides both marginal and mask-conditional coverage guarantees by correcting prediction intervals post-imputation, via importance reweighting or acceptance-rejection schemes. This enables compatibility with standard distributional imputation pipelines while delivering sharper, adaptively valid prediction sets even under complex missing data mechanisms, such as MCAR, MAR, or MNAR (Fan et al., 16 Dec 2025).
1. Problem Formulation and Background
Consider a supervised regression or classification framework where the data-generating process produces i.i.d. triples , with the covariate vector of dimension , a mask indicating missingness ( if is missing, 0 otherwise), and the response or label. The observed features are , with unobserved entries replaced by NA. In this context, conformal prediction aims to construct a prediction set such that, for new data,
for all masks with nonzero probability mass. Standard split conformal prediction furnishes only marginal validity (MV) under exchangeability; mask-conditional validity (MCV) is generally not guaranteed, especially under heteroskedastic or non-random missing patterns (Fan et al., 16 Dec 2025).
2. The Preimpute–Mask–Then–Correct Framework
To handle missing covariates, the framework consists of the following pipeline:
- Preimpute: Each calibration sample is subjected to distributional imputation via any probabilistic mechanism , yielding imputed supported on the observed entries.
- Mask: For a test-time missingness pattern (mask) , each imputed calibration instance is re-masked so that only the features indicated as observed by are retained.
- Correct: A correction step is applied to address the distributional shift between the imputed-masked calibration set () and the true masked-conditional law (). This is achieved either by reweighting calibration points via the likelihood ratio , or by acceptance-rejection sampling to simulate draws from .
A schematic table of the high-level steps:
| Step | Description | Output |
|---|---|---|
| Preimpute | Draw imputed for each calibration | |
| Mask | Apply mask to each | |
| Correct | Apply weighting or rejection sampling for coverage | Adjusted calibration scores |
This method leverages the calibration regime of split CP and is agnostic to the specific mechanism of imputation .
3. Weighted and Acceptance–Rejection Correction Methods
3.1 Weighted Conformal Prediction
Under the assumption , compute the likelihood ratio (importance weight)
for each calibration instance and test candidate. The normalized conformal weights for prediction set construction become, for ,
Prediction sets are computed by evaluating the quantile over the weighted empirical distribution of nonconformity scores, yielding
where is the nonconformity measure and is the weighted quantile (Fan et al., 16 Dec 2025).
3.2 Acceptance–Rejection Corrected CP
If the likelihood ratio is bounded, perform acceptance-rejection by sampling ; accept calibration point if with . The accepted subset follows the target law post-masking. Split CP is then run as usual on this subset (Fan et al., 16 Dec 2025).
4. Theoretical Validity and Robustness Results
Exact Coverage Guarantees
- Weighted CP: Under absolute continuity and exchangeability, the weighted procedure provides exact mask-conditional validity: .
- ARC CP: Provided is bounded, acceptance-rejection creates a calibration set i.i.d. under ; split CP then ensures (Fan et al., 16 Dec 2025).
Effect of Imperfect Estimation
If only an estimated (normalized to ) is available, the coverage is controlled up to a total variation penalty: with total variation between the estimated and true mask-conditional laws. The practical implication is that weight estimation should be sufficiently accurate (e.g., test-set ensures empirical miscoverage is acceptably small) (Fan et al., 16 Dec 2025).
Necessity of Correction
Empirical ablation shows that omitting the correction step ("impute–mask–split" only) leads to under-coverage (e.g., worst-case mask-conditional coverage 89% when targeting 90%), while reweighting or ARC restores validity.
5. Empirical Evaluation and Performance
Experiments span both synthetic and real tabular datasets, evaluating the procedures under MCAR, MAR, and MNAR mechanisms. Key findings:
- Synthetic: For covariates, under 50% MCAR, both weighted CP and ARC CP attain nominal 90% mask-conditional coverage. They reduce average interval width by roughly 30% compared to the MDA-Nested baseline, which is substantially more conservative.
- Real-world: On datasets such as UCI Concrete (8 features, missing per test), Bike-sharing, and MEPS19, both procedures preserve desired coverage per mask and shrink prediction intervals by relative to conservative alternatives.
- ARC vs. Weighted: ARC CP is computationally fast, reduces width further on some tasks, and does not break coverage when used with a slightly inflated (acceptance rate recommended).
A summary comparison of selected methods:
| Method | Mask-Conditional Valid | Average Width Reduction vs. Baseline |
|---|---|---|
| Weighted CP | Yes | 10–30% |
| ARC CP | Yes | Largest |
| MDA-Nested | Conservative | None |
| Naive Split | No | Narrower but under-covers |
6. Implementation and Practical Guidance
Integration into standard supervised pipelines is streamlined:
- Any off-the-shelf distributional imputer (e.g., MICE, Bayesian Ridge) fills the calibration set a single time.
- The test instance is not imputed; instead, the imputed calibration set is masked to match the test-time missingness.
- Correction is applied post hoc: either weighted CP or ARC CP is used to calibrate prediction sets from any base model.
- For weighted CP, a coarse search grid over is recommended ( of response range).
- For ARC CP, should be slightly larger than the estimated max , ensuring adequate sample size.
- For estimation, balanced classifier approaches (e.g., histogram-GBDT, logistic regression) with enough calibration data () yield robust results; extremely poor weight estimation degrades coverage reliability (Fan et al., 16 Dec 2025).
7. Limitations and Prospective Directions
The primary limitations include the requirement for accurate estimation (or boundedness) of the likelihood ratio ; grossly inaccurate estimators or extreme unboundedness can break coverage or reduce acceptance rates in ARC CP. Weighted CP quantile searches may be computationally expensive for fine grids or high-dimensional . Potential future extensions involve adaptive tuning of for improved efficiency/width tradeoff in ARC, online masking for streaming data, robustification for imperfect weights, and generalization to classification or structured-output tasks with missing entries (Fan et al., 16 Dec 2025).