Group-Weighted Conformal Prediction (GWCP)

Updated 15 January 2026

GWCP is a conformal inference method that reweights nonconformity scores based on group structure or similarity, ensuring local coverage validity.
It integrates discrete group stratification, kernel-based spatial weighting, and model aggregation to address heterogeneity and covariate shift.
Empirical studies show GWCP achieves nearly exact finite-sample guarantees with improved group-wise calibration in various real-world and synthetic scenarios.

Group-Weighted Conformal Prediction (GWCP) refers to a family of conformal inference methods in which calibration residuals or nonconformity scores are reweighted according to pre-specified group structure, data-driven similarity, or covariate shift adjustments, allowing for improved local validity and adaptive coverage in heterogenous, nonexchangeable, or stratified settings. GWCP unifies discrete group-reweighting (strata, administrative units), kernel-based spatial weighting, and weighted aggregation across models or data sources within the conformal prediction framework, enabling both distribution-free marginal guarantees and, crucially, improved conditional (group-wise) calibration. Analytical and empirical results demonstrate that GWCP can deliver nearly exact finite-sample guarantees in stratified/discrete-shift regimes and substantial empirical uniformity of coverage across groups or spatial regions.

1. Foundations of Group-Weighted Conformal Prediction

Conformal prediction (CP) provides valid, marginal finite-sample coverage for prediction intervals or sets, assuming exchangeable calibration data. Given a pre-trained predictor $\hat{f}$ and a calibration set $\{(X_i,Y_i)\}_{i=1}^n$ , standard split CP for regression forms nonconformity scores $\alpha_i = |Y_i - \hat{f}(X_i)|$ and defines the prediction interval for a new $X_{n+1}$ as $[\hat{f}(X_{n+1}) \pm \hat{q}_{1-\alpha}]$ with $\hat{q}_{1-\alpha}$ the empirical $(1-\alpha)$ -quantile of $\{\alpha_i\}$ ; this yields coverage level $1-\alpha$ under exchangeability (Hjort et al., 2023, Bhattacharyya et al., 2024).

GWCP extends CP by assigning weights $w_i$ to calibration points, typically encoding group membership, spatial proximity, importance, or values derived from learned models such as quantile regression forests. The prediction interval then solves a weighted quantile problem, with the interval at $X_{n+1}$ taking the form $[\hat{f}(X_{n+1}) \pm q_{1-\alpha}^{(w)}/w_{n+1}]$ , where $q_{1-\alpha}^{(w)}$ is the smallest $q$ such that the weighted cumulative sum $\sum_{i:\alpha_i^{(w)} \le q}w_i$ exceeds $(1-\alpha)\sum_{i=1}^n w_i$ (Hjort et al., 2023, Barber et al., 2022).

2. Motivations: Heterogeneity, Covariate Shift, and Group Structure

Classical CP intervals can suffer from non-uniformity of coverage across groups, time periods, or regions—intervals may be systematically too liberal or conservative in specific subpopulations or locations, especially when group (or spatial) structure induces heterogeneity in prediction error. Further, the assumption of exchangeability can be violated under covariate shift, compromising the nominal coverage.

GWCP directly addresses these pathologies:

Group stratification and administrative units: For example, in real estate appraisal, coverage gaps are observed between urban districts due to spatial variation in error structure. GWCP, with group- or kernel-based weights, restores local coverage uniformity (Hjort et al., 2023).
Covariate shift: When the test covariate law differs from the calibration law but $P_{Y\mid X}$ is invariant, GWCP (a specialization of weighted conformal methods) can provide sharp finite-sample guarantees as long as the shift is purely in discrete group membership (Bhattacharyya et al., 2024).
Fairness and controlled miscoverage: GWCP enables group-wise control of miscoverage rates, underpinning recent methods for equalized coverage under demographic or distribution shift (Alpay et al., 29 Sep 2025, Wong et al., 17 May 2025).
Mixture-of-experts and model aggregation: Aggregating through group- or model-wise p-values with data-dependent weights interpolates between global and local validity, reducing “worst-slice” undercoverage (Wong et al., 17 May 2025).

3. Methodologies: Weighting Schemes and Algorithms

3.1. Calibration Weighting Approaches

Indicator (Mondrian) Group Weights: $w_i = 1$ if $g_i = g_{n+1}$ , 0 otherwise. Recovers separate, group-wise conformal prediction (Mondrian CP) (Hjort et al., 2023).
Kernel or Proximity Weights: $w_i = \exp(-d(X_i, X_{n+1})^2/\eta)$ , tuning the decay parameter to balance locality and stability (Hjort et al., 2023).
Quantile Regression Forest Weights: $w_i(x)$ learned by a QRF reflecting similarity between calibration points and $x$ , used to compute a localized, adaptive conformal quantile (Amoukou et al., 2023).
Likelihood-Ratio or Importance Weights: $w_k = q_k/p_k$ for shift across discrete groups; more generally, density-ratio weights estimated via binary classification or other methods (Bhattacharyya et al., 2024, Alpay et al., 29 Sep 2025).

3.2. Generalized GWCP Algorithmic Outline

For a target miscoverage $\alpha$ , calibration set $(X_i, Y_i)$ , and weights $w_i$ :

Compute residuals $r_i = |Y_i - \hat{f}(X_i)|$ or other nonconformity scores.
Assign/calibrate $w_i$ based on group structure, proximity, or other criteria.
Form weighted scores $\alpha_i^{(w)} = w_i r_i$ or equivalent.
Compute threshold $q_{1-\alpha}^{(w)}$ so $\sum_{i:\alpha_i^{(w)} \leq q_{1-\alpha}^{(w)}} w_i \ge (1-\alpha)\sum_{i=1}^n w_i$ .
Return interval $[\hat{f}(X_{n+1}) \pm q_{1-\alpha}^{(w)}/w_{n+1}]$ (Hjort et al., 2023, Bhattacharyya et al., 2024).

Alternative formulations exist for combining prediction sets over multiple models or data sources using weighted aggregation of conformal p-values, with finite-sample inflation factors depending on the maximum group weight (Wong et al., 17 May 2025).

4. Theoretical Guarantees: Marginal and Groupwise Coverage

Weighted conformal prediction retains valid marginal coverage if the weighted calibration residuals satisfy a form of weighted exchangeability; for fixed, group-based weights, this reduces to controlling the empirical distribution of residuals within each group (Bhattacharyya et al., 2024, Barber et al., 2022). In the group-shift setting with $K$ groups, under random sampling,

$\Pr\{Y_{n+1} \in \widehat C_n(X_{n+1})\} \ge 1-\alpha - \mathbb{E}\left[\max_{k:n_k>0}\frac{q_k}{n_k}\right],$

where $n_k$ is the number of calibration points in group $k$ , $q_k$ the test probability for group $k$ (Bhattacharyya et al., 2024). When group counts are adequate, the coverage shortfall vanishes as $O(1/n)$ .

Group-conditional coverage,

$\Pr\{Y_{n+1} \in C(x) \mid X_{n+1} \in \mathcal X_k\} \approx 1-\alpha$

is achieved empirically by reweighting so that calibration points similar to the test input dominate the quantile. Bounds under nonexchangeability are given by total variation distances between calibration/test points and are controlled via the selection of $w_i$ (Barber et al., 2022, Alpay et al., 29 Sep 2025). For group-based weights under covariate shift, groupwise coverage degrades gracefully as $O(\sqrt{(1+B_a)/n_a})$ where $B_a$ is the group-specific second moment of the weights (Alpay et al., 29 Sep 2025).

5. Empirical Evidence and Applications

Extensive experimentation validates GWCP in both synthetic and real-world scenarios:

Spatial real estate data: In Oslo housing data, standard CP achieves nominal marginal coverage but exhibits systematic shortfalls and excesses by district (e.g., undercoverage in affluent west, overcoverage in east). GWCP with groupwise, spatial kernel, or nearest-neighbor weights greatly reduces these gaps, returning near-nominal per-district coverage at a modest (5–15%) increase in average interval width (Hjort et al., 2023).
Simulated group/covariate shift: GWCP demonstrates rapid convergence of the coverage bound $1-\alpha-\max_k(q_k/n_k)$ as the number of groups $K$ or calibration size grows, outperforming standard WCP under stratified sampling (Bhattacharyya et al., 2024, Ying et al., 2024).
UCI regression/heteroskedastic data: Adaptive GWCP variants (QRF-based) yield nearly optimal interval sizing and substantial reduction in conditional coverage error (Amoukou et al., 2023).
Mixture-of-experts/classification: Weighted p-value aggregation (slice/group-weighted) recovers “worst-slice” coverage to near-nominal without gross set-size inflation (Wong et al., 17 May 2025).

Empirical efficiency is affected by group calibration set size and weight selection; too fine partitions may sacrifice informativeness, while kernel bandwidths can be tuned to balance locality and stability of intervals (Hjort et al., 2023).

GWCP generalizes and connects multiple methodologies:

Mondrian CP: Special case with hard partitioning by group (Hjort et al., 2023).
Weighted conformal under covariate shift: GWCP encompasses and sharpens WCP in cases where the likelihood-ratio is piecewise constant across groups (Bhattacharyya et al., 2024), and clarifies efficiency/informativeness trade-offs for multi-source scenarios (Ying et al., 2024).
Model aggregation via weighted p-values: Unified combination of prediction sets or p-values from multiple granularities or expert models, with finite-sample inflation directly linked to the weighting scheme (Wong et al., 17 May 2025).
Fairness and covariate shift: Importance-weighted calibration, as in $C^3F$ , enables group-conditional coverage under distributional change, with explicit coverage-parity bounds (Alpay et al., 29 Sep 2025).
Nonexchangeable and nonsymmetric inference: GWCP provides coverage under nonexchangeability by prioritizing more “relevant” or “reliable” points via the weight vector, and incorporates a randomization procedure when using nonsymmetric learning algorithms (Barber et al., 2022).

7. Practical Considerations and Limitations

Practical application of GWCP requires:

Accurate estimation of group or importance weights. For group shift, weights $q_k/p_k$ can be accurately estimated from calibration data; for continuous covariate shift, misestimation may degrade coverage (Bhattacharyya et al., 2024, Alpay et al., 29 Sep 2025).
Trade-off between granularity and informativeness: Too fine groupings or sharply decaying kernels can reduce effective calibration set size and inflate prediction interval width (Hjort et al., 2023, Ying et al., 2024).
Computational cost: QRF-derived groupings and weighted quantile computation can be efficient when groups are small, but scalability considerations arise for large calibration sets (Amoukou et al., 2023).
Graceful degradation under shift: Coverage guarantees degrade only with the $L_2$ norm or second moment of weights in group-weighted CP, compared to $L_1$ or TV distance in generic WCP (Alpay et al., 29 Sep 2025).
Selection of miscoverage budget splits and kernel bandwidth: Hyperparameters should be tuned to minimize local coverage variance or maximize uniformity.

When groups are large and well-represented, GWCP attains coverage rates approaching $1-\alpha$ rapidly with growing $n$ and/or $K$ (Bhattacharyya et al., 2024). With small or poorly represented groups, a correction to miscoverage levels or additional calibration may be required for exact guarantees.

References

"Uncertainty quantification in automated valuation models with spatially weighted conformal prediction" (Hjort et al., 2023)
"Group-Weighted Conformal Prediction" (Bhattacharyya et al., 2024)
"Adaptive Conformal Prediction by Reweighting Nonconformity Score" (Amoukou et al., 2023)
"Conformal prediction beyond exchangeability" (Barber et al., 2022)
"Improving Coverage in Combined Prediction Sets with Weighted p-values" (Wong et al., 17 May 2025)
"Calibrated Counterfactual Conformal Fairness ( $C^3F$ )" (Alpay et al., 29 Sep 2025)
"Informativeness of Weighted Conformal Prediction" (Ying et al., 2024)