Conformalized Quantile Regression
- Conformalized Quantile Regression is a framework that integrates quantile regression with conformal prediction to produce adaptive prediction intervals with strict finite-sample coverage.
- It employs a data-split approach with robust quantile estimators and a conformal calibration step to maintain accuracy even under heteroscedasticity and high-dimensional conditions.
- Recent advancements enhance CQR through local adaptivity, density calibration, and multivariate extensions, improving performance in structured or non-i.i.d. data scenarios.
Conformalized Quantile Regression (CQR) is a statistical framework that integrates quantile regression—a methodology for estimating conditional quantiles—with conformal prediction, which provides finite-sample, distribution-free marginal coverage guarantees. This fusion yields prediction intervals that adapt to heteroscedasticity and complex data geometry while maintaining rigorous statistical validity. CQR is widely used in modern machine learning and statistics as it balances flexibility, finite-sample control, and computational tractability. Recent research focuses on improving its local adaptivity, conditional coverage, high-dimensional scalability, multivariate extension, and robustness in structured or non-i.i.d. data regimes.
1. Problem Setup and Core CQR Methodology
CQR addresses the standard regression setting: observing i.i.d. data from an unknown distribution %%%%1%%%% over (usually , ), the aim is to produce, for any new test instance , an interval such that
with the probability being over all randomness in the data, including .
The basic CQR algorithm (Romano et al., 2019, Sousa et al., 2022, Sesia et al., 2019) consists of:
- Data Split: Divide data into a (proper) training set and a calibration set .
- Quantile Regressors: Train two quantile regression models on , targeting the lower and upper conditional quantiles:
where is the pinball (quantile) loss.
- Conformal Calibration: On , compute nonconformity scores:
Let denote the empirical -quantile of .
- Interval Construction: For a test point ,
Exchangeability of calibration and test points guarantees marginal coverage at (Romano et al., 2019, Sousa et al., 2022).
2. Adaptivity, Efficiency, and Marginal Coverage
CQR inherits adaptivity from quantile regression—interval width naturally reflects local conditional spread, enabling the method to handle heteroscedasticity. The conformal calibration step ensures exact finite-sample marginal coverage, regardless of the underlying noise distribution or estimator bias (Romano et al., 2019).
Extensive experiments demonstrate that CQR achieves the intended coverage with average interval widths substantially smaller than those of non-adaptive or constant-width conformal methods, especially in heteroscedastic and high-dimensional regimes (Romano et al., 2019, Sesia et al., 2019).
Non-asymptotic efficiency analyses (see (yao et al., 8 Oct 2025)) quantify the deviation of the CQR interval from oracle width: where is the training set size, the calibration set size, and the target error rate, capturing the tradeoff between sample allocation and coverage/efficiency.
3. Extensions: Local and Density-Adaptive Conformalization
Classic CQR applies a single conformal shift across the entire input domain, which limits adaptivity in highly heterogeneous settings. Improved methods stratify the calibration step via clustering, local density calibration, or hybrid approaches.
Weighted k-Means Stratification (Sousa et al., 2022). The ICQR procedure clusters in feature space (with permutation-importance weighting), applies separate conformal calibration in each cluster, and assigns test points to clusters by feature proximity. Each cluster gets its own conformal quantile shift, increasing local adaptivity to variable heteroscedasticity:
- Clustering by feature permutation-importance,
- Cluster-specific conformal quantiles,
- Coverage guarantee holds within each cluster and globally.
Density-Calibrated CQR (CQR-d) (Lu, 2024) introduces hybrid local-global conformity scores, using k-NN density estimates:
- Compute local and global conformity quantiles,
- Blend scores via density-based weights,
- Optimize a global scaling hyperparameter for calibration,
- Achieves coverage at (where is from optimization) and produces intervals up to 21% narrower in simulated heteroscedastic settings.
4. Multivariate, High-Dimensional, and Flexible Model Extensions
CQR has been generalized to several complex regimes:
- Multivariate Responses (Kondratyev et al., 29 Sep 2025): Conditional vector quantile regression with neural optimal transport enables high-dimensional, geometry-adaptive conformal prediction sets, exploiting the OT structure for joint distribution coverage guarantees.
- Ensembles and Aggregation (Fakoor et al., 2021): Weighted ensembles of base quantile models, post-hoc monotonicity corrections (sorting/isotonic regression), and cross-conformal calibration (QOOB) achieve state-of-the-art risk/efficiency profiles in large benchmark evaluations.
- Dynamic Quantization for High-Density Regions (Cengiz et al., 2024): Adaptive prototype-based quantile methods provide region-based, non-convex, high-density coverage with scalable dynamic allocation/removal of quantization bins during training. Conformal regions are calculated as unions of high-density Voronoi cells covering mass.
5. Conditional Coverage, Uncertainty, and Practical Considerations
While CQR provides finite-sample marginal coverage, exact conditional coverage cannot be achieved in finite samples in a distribution-free way. Nevertheless, numerous approaches improve conditional coverage in practice:
- Density-Weighted and Conditional Losses (Chen et al., 30 Dec 2025): By incorporating local density estimates or penalizing coverage error at the conditional level (via density-weighted pinball losses or auxiliary quantile heads), one can reduce mean squared conditional coverage error and improve worst-slice/partition guarantees while retaining approximate marginal validity.
- Uncertainty-Aware CQR (UACQR) (Rossellini et al., 2023): Explicitly separates epistemic (estimation) from aleatoric (conditional) uncertainty by ensemble quantile models and inflates intervals adaptively where quantile estimation is less certain. UACQR yields improved conditional coverage in both simulations and real datasets.
- Support for Non-i.i.d. Data (Jensen et al., 2022): Ensemble bootstrap mechanisms (EnCQR) extend CQR to time series and non-exchangeable data by forming leave-one-out residuals from ensembles, maintaining approximate marginal validity.
- Efficiency Tuning (Sesia et al., 2019, yao et al., 8 Oct 2025): The data split fraction (training/calibration) and quantile estimation hyperparameters should be tuned empirically, with allocations in the training fraction range yielding the best compromise between accurate quantile estimation and conformal quantile calibration stability.
6. Algorithmic Summary and Theoretical Guarantees
Standard CQR Algorithm (Romano et al., 2019, Sousa et al., 2022):
- Split: .
- Train: Quantile regression at .
- Compute: Residual max(, ) on .
- Quantile: Empirical -quantile .
- Predict: For , output .
Guarantee: For any exchangeable sample (calibration + test),
The guarantee remains exact for split-conformal CQR; cross-conformal and full-conformal variants can achieve near-exact guarantees at higher computational cost.
7. Limitations, Extensions, and Current Directions
Limitations:
- Provides only marginal, not conditional, coverage.
- Relies critically on the quality of quantile regressors; poorly calibrated or misspecified quantiles yield conservative (wide) intervals to maintain coverage (Romano et al., 2019).
- The clustering/density-adaptive extensions add algorithmic and computational complexity, especially for high-dimensional feature spaces (Sousa et al., 2022, Lu, 2024).
Extensions/Future Work:
- Further improving local adaptivity through continuous stratification, dynamic density estimation, or covariate-informed conformalization.
- Online/streaming, multi-fidelity, or structured data settings (Salinas et al., 2023).
- Multivariate and tensor-valued CQR (Kondratyev et al., 29 Sep 2025, Bahmani, 24 Jan 2026).
- Improved group, subgroup, or conditional coverage through partition learning, density-weighted calibration, and uncertainty estimation strategies (Chen et al., 30 Dec 2025, Rossellini et al., 2023, Alaa et al., 2023).
- Theoretical analyses of finite-sample and asymptotic properties, especially regarding efficiency, conditional guarantee relaxations, and the tradeoffs inherent in data allocation (yao et al., 8 Oct 2025, Sesia et al., 2019).
Conformalized Quantile Regression constitutes an active research frontier at the intersection of conformal inference, robust regression, uncertainty quantification, and machine learning. Methodological innovations and theoretical refinements continue to extend its utility and optimality across diverse applications such as time-series forecasting, hyperparameter optimization, tensor prediction, and structured probabilistic modeling (Jensen et al., 2022, Salinas et al., 2023, Bahmani, 24 Jan 2026, Cengiz et al., 2024, Lu, 2024, Kondratyev et al., 29 Sep 2025).