Split & Weighted Conformal Prediction

Updated 30 December 2025

The method provides finite-sample prediction intervals by using data-splitting and weighted quantiles to adjust for covariate shift and sampling bias.
Weighted conformal prediction utilizes importance sampling and group/mask-based strategies to correct for discrepancies between calibration and test distributions.
Empirical results and theoretical guarantees show improved efficiency and robustness, yielding narrower prediction sets under varied distributional challenges.

Split and weighted @@@@1@@@@ refer to a class of nonparametric, distribution-free methods for uncertainty quantification in statistical prediction, specifically designed to provide finite-sample coverage guarantees even in the presence of covariate shift, class imbalance, missing data, or heterogeneity across groups. These methods generalize traditional conformal prediction by replacing the standard empirical quantiles of nonconformity scores with weighted quantiles, adjusting for discrepancies between the calibration and target (test) distributions. This framework includes compatibility with data-splitting (split conformal) and various weighting schemes, leveraging importance sampling and group membership to correct for sampling biases and distributional shifts.

1. Foundations of Split Conformal Prediction

The split conformal prediction procedure operates by dividing the available data into a training set and a calibration set. A regression or classification model $\mu_0$ (or $\hat{f}$ in some treatments) is fitted using the training set. For each example $(X_i, Y_i)$ in the calibration set, a nonconformity score is computed; in regression, this is often the absolute residual $R_i = |Y_i - \mu_0(X_i)|$ .

To make a prediction at a new covariate $x$ , the interval

$\widehat C_n(x) = [\mu_0(x) - Q_{1-\alpha},\; \mu_0(x) + Q_{1-\alpha}]$

is output, where $Q_{1-\alpha}$ is the empirical $1-\alpha$ quantile of $\{R_1,\dots,R_n,\infty\}$ . By exchangeability of the scores on the calibration set and the new test point’s score, this construction satisfies the marginal coverage guarantee

$P\{Y_{n+1} \in \widehat C_n(X_{n+1})\} \ge 1 - \alpha$

without assumptions on the model $\mu_0$ (Tibshirani et al., 2019, Bhattacharyya et al., 30 Jan 2024).

This structure is central to subsequent weighted generalizations.

2. Weighted Conformal Prediction: Covariate Shift and Weighted Exchangeability

When the distribution of covariates in the calibration set ( $P_X$ ) differs from that in the target/test set ( $\widetilde P_X$ ), i.e., under covariate shift, marginal validity can be lost. Weighted conformal prediction restores validity by incorporating importance weights. Specifically, the weight function is defined as

$w(x) = \frac{d\widetilde P_X}{dP_X}(x)$

Under the covariate shift model, the (augmented) nonconformity scores plus the test point are "weighted exchangeable": their joint probability law can be factorized as a product of functions of each coordinate (the weights) times a symmetric function. This generalizes exchangeability and enables extending the conformal argument to the new setting (Tibshirani et al., 2019).

The weighted quantile replaces the ordinary empirical quantile, with

$\widehat Q_{1-\alpha} = \inf\left\{ t: \sum_{i=1}^n w(x_i)\,\mathbf{1}\{R_i \le t\} \ge (1-\alpha) \sum_{i=1}^n w(x_i) \right\}$

This weighted estimator guarantees

$P\{Y_{n+1} \in \widehat C_n(X_{n+1})\} \ge 1 - \alpha$

provided the weights are known or accurately estimated (Tibshirani et al., 2019, Bhattacharyya et al., 30 Jan 2024).

The weighted-conformal procedure is summarized in the following steps (Tibshirani et al., 2019):

Compute residuals and weights $(R_i, w(x_i))$ for the calibration set.
For each test $x$ , compute the weighted quantile $\widehat Q_{1-\alpha}$ .
Output $[\mu_0(x)-\widehat Q_{1-\alpha}, \mu_0(x)+\widehat Q_{1-\alpha}]$ .

Empirical results demonstrate restoration of nominal coverage under simulated covariate shift.

3. Generalizations: Group, Mask, and Frequency-based Weighting

Beyond classical covariate shift, weighted conformal prediction encompasses structures such as:

Group-weighted conformal prediction: When the population is partitioned into $K$ discrete groups with group distribution shifting from $P$ to $Q$ , the weights are $w(x) = q_k/p_k$ for group $k$ . The prediction set is calibrated using a group-weighted mixture distribution over conformal scores:

$\widehat P_{\mathrm{score}} = \sum_{k=1}^K q_k \widehat P_{\mathrm{score}}^{(k)}$

Coverage guarantees improve drastically over general weighting in settings with fixed, well-sampled groups (Bhattacharyya et al., 30 Jan 2024).

Mask-weighted conformal prediction under missing data: Given general missings given by mask patterns $M \in \{0,1\}^d$ , weighted conformal prediction with mask-conditional weighting $\omega_m$ is deployed after calibration set imputation and masking:

$\omega_m(x_{obs(m)},y) = \frac{dP_m}{dQ_m}(x_{obs(m)},y)$

Prediction sets constructed using these weights achieve both marginal and mask-conditional validity, extending guarantees to data missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR) (Fan et al., 16 Dec 2025).

Class-label and frequency-based weighting in imbalanced/open-set settings: Selective sample splitting with probabilistic inclusion functions $\pi(k)$ (as a function of class frequency) is used to prioritize rare classes in calibration. The resulting nonconformity scores are weighted according to explicit formulas based on label frequencies, restoring validity under severe class imbalance or when the label space expands at test time (Xie et al., 14 Oct 2025).

4. Weighted Aggregation of Multiple Conformity Scores

Weighted conformal prediction is not restricted to covariate-based weighting. In multi-class classification, prediction set size can be drastically reduced by aggregating multiple nonconformity (or conformity) scores in a weighted fashion. For a collection of $k$ scores $s_1,\dots,s_k$ , the aggregated score $S_w(x,y) = \sum_{j=1}^k w_j s_j(x,y)$ is used in the split-conformal construction. Weights are chosen—via constrained optimization—to minimize average prediction set size subject to the coverage constraint:

$\min_{w \in \Delta_k} \; \mathbb{E}_{(X,Y)}\left[|C_w(X)|\right] \quad \text{s.t.} \quad P\left\{Y \notin C_w(X)\right\} \le \alpha$

where $C_w(x)$ denotes the conformal prediction set generated with $S_w$ (Luo et al., 14 Jul 2024). Coverage is preserved marginally, and theoretical analysis establishes connections to subgraph classes in VC theory.

Empirical results on image classification demonstrate substantial set-size reductions for the same coverage, with the method outperforming all single-score conformal predictors.

5. Theoretical Guarantees and Effective Sample Size

Weighted conformal prediction provides strong, minimally-assumptive finite-sample guarantees:

Marginal coverage: When the appropriate weights are known or accurately estimated, coverage at the nominal level $1-\alpha$ is attained.
Conditional/structured coverage: For group-weighted or mask-conditional weighting, coverage bounds match or improve on classical conformal prediction, with precise finite-sample deficits characterized in terms of group sizes or estimation error (Bhattacharyya et al., 30 Jan 2024, Fan et al., 16 Dec 2025).
Robustness to estimated weights: Miscoverage due to estimated weights is bounded by a function of the error between estimated and Oracle weight functions; the effect can be quantitatively evaluated (Bhattacharyya et al., 30 Jan 2024).
Effective sample size: The variance of weights influences the effective sample size $\widehat n_{\rm eff} = (\sum_i w_i)^2/\sum_i w_i^2$ , impacting prediction set width particularly under severe covariate shift or imbalanced groups (Tibshirani et al., 2019).

6. Practical Considerations and Empirical Illustration

Weight estimation: In practice, test-to-train likelihood ratios for covariate shift can be estimated by probabilistic classifiers trained on labeled mixture of calibration and test covariates, e.g., logistic regression or random forests (Tibshirani et al., 2019). For group or mask settings, frequencies or mask-pattern classifiers are used.
Computational cost: Split and weighted conformal prediction yield $O(n)$ per-query complexity, as only calibration scores need be processed, in contrast to full conformal methods, which can be $O(n^2)$ or higher (Tibshirani et al., 2019).
Empirical performance: On both regression and classification tasks with covariate shift, group shift, imbalance or missingness, weighted procedures restore target coverage and often yield more informative (narrower) intervals or smaller prediction sets compared to standard split conformal, with efficiency scaling with the quality of weighting and sample size (Tibshirani et al., 2019, Fan et al., 16 Dec 2025, Bhattacharyya et al., 30 Jan 2024).

The table below gives a summary of practical aspects:

Scenario	Weight Type (w)	Coverage Guarantee
Covariate Shift	$w(x)=dQ_X/dP_X(x)$	$1-\alpha$ marginal
Group Shift	$w(x)=q_k/p_k$	$1-\alpha - O(\max_k q_k/n_k)$
Mask Missingness	$\omega_m$ (mask dep.)	$1-\alpha$ mask-conditional
Score Aggregation (multi-class)	$w_j$ (score weights)	$1-\alpha$ marginal

7. Extensions and Connections

Weighted and split conformal prediction form the technical foundation for multiple recent developments in distribution-free predictive inference:

Selective splitting and weighting for open-set and class-imbalanced problems, incorporating new label detection and Good-Turing estimation (Xie et al., 14 Oct 2025).
Mask-conditional valid sets via reweighting and acceptance-rejection calibrated on imputed datasets for arbitrary missing data patterns (Fan et al., 16 Dec 2025).
Localized, kernel-based conformal methods (split localized conformal prediction) delivering nearly conditional coverage through kernel weighted quantiles (Han et al., 2022).
Group-weighted approaches yielding sharp finite-sample coverage in stratified or grouped data structures, outperforming generic density-ratio-based weighting when group membership drives the shift (Bhattacharyya et al., 30 Jan 2024).

These advances demonstrate the broad applicability of split and weighted conformal prediction, with theoretical and empirical support for nearly all modern predictive settings involving distribution shift, heterogeneity, imbalance, or missingness.