Reweighted Conformal Prediction Procedure

Updated 17 December 2025

The paper introduces a method that guarantees both marginal and mask-conditional coverage by adjusting prediction intervals after imputation.
It employs a preimpute–mask–then–correct pipeline using importance weighting or acceptance-rejection to handle various missing data mechanisms like MCAR, MAR, and MNAR.
Empirical evaluations demonstrate up to a 30% reduction in prediction interval width while maintaining reliable coverage across different datasets.

A reweighted conformal prediction procedure is a statistical method designed to provide reliable uncertainty quantification when features (covariates) in the dataset are subject to missingness. The foundational challenge it addresses is that classical conformal prediction (CP) does not guarantee valid coverage in the presence of missing covariates, especially when coverage must be controlled not only marginally, but for each possible missingness pattern (known as mask-conditional coverage, MCV). The reweighted conformal approach provides both marginal and mask-conditional coverage guarantees by correcting prediction intervals post-imputation, via importance reweighting or acceptance-rejection schemes. This enables compatibility with standard distributional imputation pipelines while delivering sharper, adaptively valid prediction sets even under complex missing data mechanisms, such as MCAR, MAR, or MNAR (Fan et al., 16 Dec 2025).

1. Problem Formulation and Background

Consider a supervised regression or classification framework where the data-generating process produces i.i.d. triples $(X, M, Y)$ , with $X\in\mathcal{X}$ the covariate vector of dimension $d$ , $M\in\{0,1\}^d$ a mask indicating missingness ( $m_i=1$ if $x_i$ is missing, 0 otherwise), and $Y\in\mathcal{Y}$ the response or label. The observed features are $\tilde X = \mathrm{mask}(X, M)$ , with unobserved entries replaced by NA. In this context, conformal prediction aims to construct a prediction set $\hat C_\alpha(\tilde X)$ such that, for new data,

$P\big(Y \in \hat C_\alpha(\tilde X)\big) \geq 1-\alpha \quad \text{(MV)}, \quad P\big(Y \in \hat C_\alpha(\tilde X) \mid M = m \big) \geq 1 - \alpha \quad \text{(MCV)}$

for all masks $m$ with nonzero probability mass. Standard split conformal prediction furnishes only marginal validity (MV) under exchangeability; mask-conditional validity (MCV) is generally not guaranteed, especially under heteroskedastic or non-random missing patterns (Fan et al., 16 Dec 2025).

2. The Preimpute–Mask–Then–Correct Framework

To handle missing covariates, the framework consists of the following pipeline:

Preimpute: Each calibration sample $(\tilde X^i, Y^i)$ is subjected to distributional imputation via any probabilistic mechanism $\varphi$ , yielding imputed $X^i \sim Q(\cdot|\tilde X^i, Y^i)$ supported on the observed entries.
Mask: For a test-time missingness pattern (mask) $m$ , each imputed calibration instance is re-masked so that only the features indicated as observed by $m$ are retained.
Correct: A correction step is applied to address the distributional shift between the imputed-masked calibration set ( $Q_m$ ) and the true masked-conditional law ( $P_m$ ). This is achieved either by reweighting calibration points via the likelihood ratio $\omega_m = dP_m/dQ_m$ , or by acceptance-rejection sampling to simulate draws from $P_m$ .

A schematic table of the high-level steps:

Step	Description	Output
Preimpute	Draw imputed $X^i$ for each calibration $(\tilde X^i, Y^i)$	$\{\hat X^i, Y^i\}_{i=1}^n$
Mask	Apply mask $m$ to each $\hat X^i$	$\{\hat X^i_\mathrm{obs}(m), Y^i\}$
Correct	Apply weighting or rejection sampling for coverage	Adjusted calibration scores

This method leverages the calibration regime of split CP and is agnostic to the specific mechanism of imputation $\varphi$ .

3. Weighted and Acceptance–Rejection Correction Methods

3.1 Weighted Conformal Prediction

Under the assumption $P_m \ll Q_m$ , compute the likelihood ratio (importance weight)

$\omega_m(x_\mathrm{obs}(m), y) := \frac{dP_m}{dQ_m}(x_\mathrm{obs}(m), y)$

for each calibration instance and test candidate. The normalized conformal weights for prediction set construction become, for $i=1,\dots,n$ ,

$W_m^i(x,y) = \frac{\omega_m^i}{\sum_{j=1}^n \omega_m^j + \omega_m(x,y)}, \quad W_m^{n+1}(x,y) = \frac{\omega_m(x,y)}{\sum_{j=1}^n \omega_m^j + \omega_m(x,y)}$

Prediction sets are computed by evaluating the $1-\alpha$ quantile over the weighted empirical distribution of nonconformity scores, yielding

$\hat C^W_\alpha(\tilde X^{n+1}) = \{ y : s(\tilde X^{n+1}, y) \leq Q_{1-\alpha}(\text{weighted scores}) \}$

where $s$ is the nonconformity measure and $Q_{1-\alpha}$ is the weighted quantile (Fan et al., 16 Dec 2025).

3.2 Acceptance–Rejection Corrected CP

If the likelihood ratio $\omega_m$ is bounded, perform acceptance-rejection by sampling $U^i\sim\text{Unif}(0,1)$ ; accept calibration point $i$ if $U^i < \omega_m^i/K$ with $K\ge\max_i \omega_m^i$ . The accepted subset follows the target $P_m$ law post-masking. Split CP is then run as usual on this subset (Fan et al., 16 Dec 2025).

4. Theoretical Validity and Robustness Results

Exact Coverage Guarantees

Weighted CP: Under absolute continuity and exchangeability, the weighted procedure provides exact mask-conditional validity: $P(Y \in \hat C^W_\alpha | M = m) \ge 1-\alpha$ .
ARC CP: Provided $\omega_m$ is bounded, acceptance-rejection creates a calibration set i.i.d. under $P_m$ ; split CP then ensures $P(Y \in \hat C^{AR}_\alpha | M = m)\ge 1-\alpha$ (Fan et al., 16 Dec 2025).

Effect of Imperfect $\omega_m$ Estimation

If only an estimated $\hat\omega_m$ (normalized to $\tilde\omega_m$ ) is available, the coverage is controlled up to a total variation penalty: $P(Y \in \hat C^{\widehat W}_\alpha|M=m) \ge (1-\alpha) - d_{TV}(\tilde P_m, P_m)$ with $d_{TV}$ total variation between the estimated and true mask-conditional laws. The practical implication is that weight estimation should be sufficiently accurate (e.g., test-set $\mathrm{corr}(\hat\omega, \omega) \ge 0.3$ ensures empirical miscoverage is acceptably small) (Fan et al., 16 Dec 2025).

Necessity of Correction

Empirical ablation shows that omitting the correction step ("impute–mask–split" only) leads to under-coverage (e.g., worst-case mask-conditional coverage $\approx$ 89% when targeting 90%), while reweighting or ARC restores validity.

5. Empirical Evaluation and Performance

Experiments span both synthetic and real tabular datasets, evaluating the procedures under MCAR, MAR, and MNAR mechanisms. Key findings:

Synthetic: For $d=10$ covariates, under 50% MCAR, both weighted CP and ARC CP attain nominal 90% mask-conditional coverage. They reduce average interval width by roughly 30% compared to the MDA-Nested baseline, which is substantially more conservative.
Real-world: On datasets such as UCI Concrete (8 features, $~40\%$ missing per test), Bike-sharing, and MEPS19, both procedures preserve desired coverage per mask and shrink prediction intervals by $10\%-12\%$ relative to conservative alternatives.
ARC vs. Weighted: ARC CP is computationally fast, reduces width further on some tasks, and does not break coverage when used with a slightly inflated $K$ (acceptance rate $\gtrsim50\%$ recommended).

A summary comparison of selected methods:

Method	Mask-Conditional Valid	Average Width Reduction vs. Baseline
Weighted CP	Yes	$\sim$ 10–30%
ARC CP	Yes	Largest
MDA-Nested	Conservative	None
Naive Split	No	Narrower but under-covers

6. Implementation and Practical Guidance

Integration into standard supervised pipelines is streamlined:

Any off-the-shelf distributional imputer (e.g., MICE, Bayesian Ridge) fills the calibration set a single time.
The test instance is not imputed; instead, the imputed calibration set is masked to match the test-time missingness.
Correction is applied post hoc: either weighted CP or ARC CP is used to calibrate prediction sets from any base model.
For weighted CP, a coarse search grid over $y$ is recommended ( $h\sim0.01$ of response range).
For ARC CP, $K$ should be slightly larger than the estimated max $\omega_m$ , ensuring adequate sample size.
For $\omega_m$ estimation, balanced classifier approaches (e.g., histogram-GBDT, logistic regression) with enough calibration data ( $n\ge100$ ) yield robust results; extremely poor weight estimation degrades coverage reliability (Fan et al., 16 Dec 2025).

7. Limitations and Prospective Directions

The primary limitations include the requirement for accurate estimation (or boundedness) of the likelihood ratio $\omega_m$ ; grossly inaccurate estimators or extreme unboundedness can break coverage or reduce acceptance rates in ARC CP. Weighted CP quantile searches may be computationally expensive for fine grids or high-dimensional $y$ . Potential future extensions involve adaptive tuning of $K$ for improved efficiency/width tradeoff in ARC, online masking for streaming data, robustification for imperfect weights, and generalization to classification or structured-output tasks with missing entries (Fan et al., 16 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

Weighted Conformal Prediction Provides Adaptive and Valid Mask-Conditional Coverage for General Missing Data Mechanisms (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Reweighted Conformal Prediction Procedure.