Regression-Based Split Conformal Prediction

Updated 22 September 2025

The framework splits data into training and calibration sets, using nonconformity scores to construct prediction intervals with finite-sample coverage guarantees.
It adapts to local data features by employing normalized scores and kernel-localized calibration to manage heteroscedasticity and complex error structures.
Innovations such as trimmed prediction and multiple split aggregation enhance computational efficiency while ensuring both marginal and conditional validity.

Regression-based split conformal prediction (CP) frameworks provide rigorous finite-sample, distribution-free guarantees for uncertainty quantification in regression tasks. Unlike traditional regression methods that yield point estimates or asymptotic confidence intervals, split CP constructs data-driven prediction intervals or sets that maintain prescribed coverage probabilities with minimal distributional assumptions. Central to this framework is the idea of splitting available data into distinct training and calibration sets, then using nonconformity (or conformity) scores to calibrate intervals whose empirical marginal coverage matches the target level. Recent advances refine these methods further with localized, model-aware, and efficiency-driven enhancements, addressing average and conditional coverage, computational scalability, adaptation to heteroscedasticity, and robustness to both exchangeability violations and model misspecification.

1. Fundamental Principles and Workflow

The canonical regression-based split conformal predictor proceeds in two main stages. First, the dataset is partitioned into a proper training set, used to fit a regression model or predictor $\hat{\mu}(x)$ , and a calibration set, used to empirically estimate the distribution of prediction errors. For a significance level $\alpha$ , the calibration nonconformity scores—typically absolute residuals $s(x_i, y_i) = |y_i - \hat{\mu}(x_i)|$ —are collected from the calibration set. The $(1-\alpha)$ -quantile of these scores, denoted $q$ , is computed, and the final prediction interval for a new input $x$ is given by

$\mathcal{C}_{\text{split}}(x) = [\, \hat{\mu}(x) - q,\, \hat{\mu}(x) + q \,].$

Under i.i.d. or (more generally) exchangeable data, the coverage guarantee

$\mathbb{P} \big( Y_{n+1} \in \mathcal{C}_{\text{split}}(X_{n+1}) \big) \geq 1-\alpha$

holds exactly, up to a finite-sample correction. Splitting the data ensures that calibration is unbiased with respect to the model fit, preventing information leakage.

2. Advances in Nonconformity Measures and Local Adaptivity

The informativeness of the prediction intervals hinges on the nonconformity measure selected. Traditionally, the absolute prediction residual is used. However, this may fail to adapt to spatially varying complexity or heteroscedastic noise, leading to suboptimal (often excessively wide or narrow) intervals in regions of differing difficulty or label variability.

To address this, several normalization and localization strategies have been introduced:

Normalized Scores via Nearest Neighbourhoods: By dividing the raw residual by an estimate of local uncertainty—such as the distance-based density of neighbours or local label standard deviation—error terms are penalized less in difficult regions. For example, normalized nonconformity measure

$\alpha_i = \frac{|y_i - \hat{y}_i|}{1 + \lambda \, \xi_i}$

where $\xi_i$ quantifies the density (distance to nearest neighbours) (Papadopoulos et al., 2014).

Kernel-Localized Calibration (SLCP): By employing kernel density estimators over the features, the conditional distribution of residuals is locally approximated (Han et al., 2022). The nonconformity score becomes

$V^{(\alpha, h)}(x, y) = |y - \hat{\mu}(x)| - Q(\alpha, F_h(V\,|\,X=x))$

where $F_h$ is a kernel-smoothed empirical cdf of residuals in the feature neighbourhood of $x$ .

Variance-based Normalization and Mondrian Approaches: Adaptive scores such as $|y - \hat{\mu}(x)| / \delta(x)$ , with $\delta(x)$ as a proxy for local variance, can yield intervals with (approximately) uniform conditional coverage (Dewolf et al., 2023).

These methods enable the CP procedure to contract (or expand) in response to local data density and noise level, providing both marginal and improved subgroup or conditional validity.

3. Efficiency: Statistical, Computational, and Algorithmic

Split conformal prediction is computationally efficient relative to full (transductive) CP, which requires retraining or re-evaluating the regression model for each candidate label. After data splitting and training, only a single pass over the calibration data is needed to evaluate nonconformity scores and compute the predictive quantile.

Innovations to further reduce computational cost and enhance statistical efficiency include:

Trimmed Conformal Prediction (TCP): A two-stage process where a lightweight regressor generates a wide preliminary interval (trimming step), limiting the candidate label search space for full CP with a more accurate model (Chen et al., 2016). This accelerates calibration in high-dimensional or sparse settings.
Multi Split and Aggregation Schemes: Multiple, independent splits are performed, with resulting prediction sets aggregated (e.g., via Markov-inequality-based bounds), controlling the probability of coverage violations due to random split variability (Solari et al., 2021).
Gauss-Newton Influence Approximation: By leveraging sensitivity-based parameter linearization, full CP for neural networks is approximated without retraining—permitting the exploitation of all available data while maintaining computational tractability (Tailor et al., 27 Jul 2025).

These techniques seek to balance interval tightness, validity, and runtime, with empirical results demonstrating that proper algorithmic choices can achieve both competitive widths and high coverage.

4. Extensions: Function-valued, Bounded, Circular, and Structured Outputs

The fundamental split CP approach has been extended to more complex regression targets and structural constraints:

Bounded Regression: For outcomes in (0,1), such as proportions or rates, CP can be applied after transformation models (e.g., logit–expit) or beta regression, utilizing quantile, Pearson, or model-specific residuals to define nonconformity scores and ensure prediction sets remain within the feasible range (Wu et al., 18 Jul 2025).
Function-valued Outputs and Neural Operators: In infinite-dimensional spaces (e.g., PDE solutions), split CP is performed on a discretized version of the function domain, with theoretical guarantees lifted to the function space using bilipschitz continuity of the discretization operator. Diagnostic metrics such as conformal ensemble score and internal agreement quantify coverage in autoregressive forecast scenarios (Millard et al., 4 Sep 2025).
Circular Data: For regression where responses are angles or elements of the circle, split CP uses angular distance in the conformity score and leverages model projections (e.g., projecting linear regressors onto the circle via sine and cosine parameterizations). Random forests with out-of-bag calibration permit efficient set construction (F. et al., 31 Oct 2024).
Prediction with Upper and Lower Bounds: When deterministic bounds from physical, optimization, or expert systems are available, split CP can be extended to combine these via multiple nested candidate intervals and selection rules that ensure both efficiency and coverage, even when bounds become extremely tight (Li et al., 6 Mar 2025).

5. Validity under Non-Exchangeability and Label Noise

Classic split CP assumes exchangeability (or i.i.d.) of data points. Recent developments offer coverage guarantees in the presence of dependence, such as time series, spatial, or nonstationary data:

Concentration and Decoupling Framework: Validity is established under concentration of calibration empirical distributions and decoupling between training and test samples, resulting in coverage up to an explicit penalty quantifiable via process-specific parameters (e.g., β-mixing coefficients) (Oliveira et al., 2022).
Noise-Robust Calibration: When calibration labels are contaminated with additive noise, the effective calibration threshold can be estimated via empirical deconvolution using a known or estimated noise kernel, leading to interval lengths and coverage close to (unknown) clean-label baselines (Cohen et al., 18 Sep 2025).

Both lines of analysis demonstrate that, even when the standard exchangeability premise is violated or noisy calibration labels are present, the split conformal framework can be adapted to deliver valid coverage with only minor (quantified) relaxation.

6. Conditional Validity, Interval Efficiency, and Length Optimization

A persistent challenge is ensuring that prediction sets are not only valid on average (marginal coverage) but also valid within subpopulations or under covariate shifts ("conditional validity"). Recent frameworks formalize this dual objective:

Conditional Validity via Group/Taxonomy Structure: By partitioning the feature space (e.g., Mondrian conformal prediction), coverage is calibrated within each group or region, yielding local guarantees that adapt to subgroup-specific error distributions (Dewolf et al., 2023).
Length Optimization (CPL Framework): A minimax dual formulation is established that seeks to achieve coverage constraints under covariate shifts while minimizing the average prediction set length (Kiyani et al., 27 Jun 2024). The optimal prediction region is characterized as a level set of the form

$C^*(x) = \{ y \in \mathcal{Y} : f^*(x)p(y\,|\,x) \ge 1 \}$

for some function $f^*$ and estimated conditional density $p(\cdot\,|\,x)$ . In finite samples, the optimization is regularized and solved over structured set classes, yielding intervals that are both short and conditionally valid.

7. Algorithmic Posterior and Predictive Distributions

Beyond set-valued predictions, split CP methodologies have been extended to produce algorithmic posterior predictive distributions:

Split Conformal Predictive Systems (SCPS): Outputs a distribution over possible outcomes by comparing candidate scores to the empirical distribution of calibration scores, yielding well-calibrated predictive probabilities (Vovk et al., 2019).
Regression as Classification CP: By discretizing and recasting the regression problem into a classification task (using “distance-aware” cross-entropy losses), conformal prediction sets may capture heteroscedastic or multimodal distributions, with intervals often narrower and coverage preserved (Guha et al., 12 Apr 2024).

These techniques provide more informative uncertainty quantification, suitable for risk-based decision-making and complex regression settings.

Conclusion

Regression-based split conformal prediction frameworks provide a versatile, theoretically robust approach to constructing prediction sets or intervals for regression problems. Through careful choice of nonconformity scores, calibration strategies, and model structure, they offer finite-sample, distribution-free coverage guarantees even under challenging conditions. Modern extensions—addressing local adaptivity, computational scalability, conditional validity, robustness to bias, dependence, model misspecification, noise, and complex output structures—have broadened the applicability and improved the efficiency of split CP for practical regression tasks across a diverse array of domains.