Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 93 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 128 tok/s Pro
Kimi K2 202 tok/s Pro
GPT OSS 120B 449 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Split Conformal Prediction & Unsupervised Calibration

Updated 10 October 2025
  • The paper introduces an adaptive method that conformalizes any predictive model to ensure finite-sample coverage guarantees using empirical quantiles.
  • It leverages unsupervised calibration via pseudo-labels and optimized weighting to maintain robust performance even under non-IID or partially labeled settings.
  • Practical implications include efficient split and cross-conformal techniques that adjust prediction intervals for both regression and classification tasks.

Split @@@@1@@@@ with unsupervised calibration refers to a suite of methodologies for constructing statistically valid prediction sets or distributions using only a “base” predictive system, while minimizing or entirely eliminating the need for labeled calibration data. This is achieved through adaptive post-processing—termed conformalization—which delivers finite-sample (marginal) probability calibration guarantees and, under certain regularity, can be effective even in non-IID or partially labeled settings. This approach relies on data-driven empirical quantiles from calibration samples, strategic weighting or optimization for pseudo-labels, and randomized tie-breaking, and extends naturally to both regression and classification. The following sections review foundational theoretical results, calibration algorithms, relaxation of the exchangeability assumption, unsupervised calibration schemes, performance guarantees, and contemporary applications.

1. General Framework and Calibration Methodology

The core idea is to start from any predictive system (denoted by AA)—which may provide point predictions or probabilistic outputs—and “conformalize” it by comparing its predictions to those observed in a calibration sample. For a new test input xx (and candidate label yy), a conformity score αy=A(z1,,zm,(x,y))\alpha^y = A(\mathbf{z}_1,\ldots,\mathbf{z}_m, (x,y)) is evaluated. The calibration module then compares αy\alpha^y to the scores {αi}\{\alpha_i\} observed on calibration data. The split-conformal predictive system (SCPS) produces a calibrated predictive distribution

CA(z1,,zn,(x,y),τ)=1nm+1{# αi<αy}+τnm+1{# αi=αy}+τnm+1C^A(\mathbf{z}_1,\ldots,\mathbf{z}_n,(x,y),\tau) = \frac{1}{n-m+1} \left\{\# \ \alpha_i < \alpha^y\right\} + \frac{\tau}{n-m+1} \left\{\# \ \alpha_i = \alpha^y\right\} + \frac{\tau}{n-m+1}

where τU[0,1]\tau \sim \mathcal{U}[0,1] provides randomized tie-breaking. This ensures that, under exchangeability, the calibrated predictive CDF Q()Q(\cdot) satisfies P(Qα)=αP(Q \leq \alpha) = \alpha for all α[0,1]\alpha \in [0,1]. This construction is fully adaptive: the output system can adjust both the location and shape of the predictive distribution for each xx.

The adaptivity and universality of this method mean that no prior assumptions (such as calibration or validity) are required of AA. In practice, AA can be a pointwise scoring rule, a quantile regression, or a full predictive density.

2. Split and Cross-Conformal Calibration Algorithms

The split-conformal approach divides the dataset into training and calibration partitions: the training set forms AA, while calibration evaluates conformity scores. This split offers computational efficiency by requiring only one model fit, and remains robust even when the base system is poorly calibrated.

Cross-conformal methods extend this principle by using multiple splits (folds), constructing several SCPS calibrated on different subsets, and aggregating the resulting outputs. Cross-conformal calibration achieves higher data efficiency at the cost of weaker (non-uniform) theoretical guarantees: the marginal calibration guarantee becomes approximate, but overall performance can improve in moderate-through-small sample regimes.

When labels are only weakly observed, or unavailable in the calibration set, unsupervised calibration can leverage pseudo-labels, partial labels, or weights derived by matching the predicted class distribution of the test data to those of the training set via an optimization problem. This can involve solving for weights wi(y)w_i(y) for each calibration sample XiX_i and each possible label yy, so that the weighted score distribution matches the labeled training data. The prediction quantile is then computed over the weighted scores:

q^=Quantile({(S(Xi,y),wi(y)/n)}i,y;(1α)(1+1/n)).\hat{q} = \text{Quantile}\big(\{(S(X_i,y), w_i(y)/n)\}_{i,y}; (1-\alpha)(1+1/n)\big).

Theoretical guarantees bound the excess coverage loss as a function of the chosen function class, optimization error, and statistical complexity terms such as Rademacher complexity.

3. Relaxing Exchangeability: Calibration under Dependence and Data Contamination

Standard SCPS coverage guarantees rely on the exchangeability or IID assumption. However, in practical settings—such as time series, spatiotemporal, or contaminated data—exchangeability may fail. Several frameworks address this via explicit error quantification:

  • Concentration and decoupling: Coverage errors are bounded by a sum of calibration concentration (εcal\varepsilon_\text{cal}), decoupling/test error (εtest\varepsilon_\text{test}), and small slack terms due to sample size.
  • Coverage penalty: The marginal coverage becomes

P(coverage)1αηwhere η=εcal+δcal+εtest.P(\text{coverage}) \geq 1 - \alpha - \eta \qquad \text{where } \eta = \varepsilon_\text{cal} + \delta_\text{cal} + \varepsilon_\text{test}.

  • Data contamination: In the presence of Huber-type contamination, coverage bounds incorporate the Kolmogorov–Smirnov distance between the clean and contaminated score CDFs, and bias-corrected procedures (Contamination Robust Conformal Prediction, CRCP) adjust the selection quantile to correct for contamination effects.

For time series, the “switch coefficient” measures how much temporal dependence disrupts exchangeability; if the underlying process is stationary β\beta-mixing, coverage penalties scale with the mixing rate.

4. Unsupervised and Weakly Supervised Calibration

In scenarios where calibration labels are missing, only partially available, or costly to acquire, unsupervised calibration methods are employed.

  • Unlabeled or partially labeled calibration sets: Conformal sets are constructed by estimating the distribution of conformity scores using model predictions. If the prediction model has accuracy 1β1-\beta, the coverage guarantee degrades additively, yielding

P(YC(Xtest))1αβ.P(Y \in \mathcal{C}(X_\text{test})) \geq 1-\alpha - \beta.

This reveals a principled tradeoff between model error and calibration-free coverage.

  • Weighted calibration via optimization: The use of convex optimization or kernel methods aligns the empirical (weighted) score distribution of the unlabeled calibration set with that of the labeled training set. Under certain regularity conditions, the marginal coverage loss is bounded by terms characterizing optimization error, RKHS norm, and empirical complexities.
  • Self-supervised calibration: Augmenting nonconformity score computation with auxiliary self-supervised losses allows the model to capture local uncertainty or “difficulty,” leading to improved adaptation without requiring additional labeled calibration data.
  • Partial label or set-valued supervision: Score functions are designed to be pessimistic (worst-case over all candidates in a set) or aggregate over candidate labels, ensuring that coverage is controlled given the available ambiguity.

5. Calibration Guarantees and Finite Sample Behavior

Analyses of SCPS yield finite-sample marginal and conditional coverage guarantees. If nn is the calibration size and the conformal threshold is chosen as the (1α)(n+1)\left\lceil (1-\alpha)(n+1) \right\rceil-th order statistic, then under exchangeability (and regular conformity function) the empirical coverage for mm test samples

Cm(n,α)=1mi=1mI{Yn+iCn(α)(Xn+i)}C_m^{(n,\alpha)} = \frac{1}{m} \sum_{i=1}^m \mathbb{I}\{ Y_{n+i} \in \mathcal{C}_n^{(\alpha)}(X_{n+i}) \}

follows a Beta–Binomial law. As mm \to \infty, the limit is Beta(b,g)(b,g) where b=(1α)(n+1)b = \lceil (1-\alpha)(n+1) \rceil and g=nb+1g = n-b+1. This permits explicit sample size selection for target coverage intervals.

In small-data settings, the Small Sample Beta Correction (SSBC) adjusts the nominal risk level to attain a desired probably approximately correct (PAC) guarantee, by selecting αadj<αtarget\alpha_\text{adj} < \alpha_\text{target} so that (with probability 1δ1-\delta over the calibration set) at least (1αtarget)(1-\alpha_\text{target}) coverage is achieved.

6. Extensions, Applications, and Limitations

Extensions

  • Function-valued outputs: For neural operators and other infinite-dimensional regression settings, conformal calibration is achieved through discretization. Guarantees are lifted to the function space by controlling discretization error, calibration statistical error, and model misspecification.
  • Circular data: Adapting conformity scores to angular prediction, with out-of-bag methods enabling unsupervised calibration in the absence of explicit calibration labels.
  • Distribution shifts: Adaptive techniques reweight calibration data using domain classifiers or similarity in embedding space to guard coverage under subpopulation or covariate shift. Optimal transport principles (e.g., 1-Wasserstein distance between calibration and test scores) provide bounds on the loss of coverage due to arbitrary distributional changes.
  • Healthcare, vision–language, and low-data regimes: Split conformal prediction augments black-box deep models in critical domains, with unsupervised or weakly supervised calibration approaches filling the gap when labeled calibration sets are limited or unavailable.

Limitations

  • Computational cost: Unsupervised calibration leveraging optimization (e.g., in the kernelized setup) introduces nontrivial computational overhead, particularly as the number of classes or calibration samples grows.
  • Trade-off in coverage: Replacing labels with predictions may degrade the guarantee by the model’s inaccuracy rate. The coverage bound 1αβ1-\alpha-\beta is sharp.
  • Calibration under strong dependence or adversarial contamination: Performance guarantees degrade as the dependence or contamination severity increases; additional corrections or slack are required to avoid undercoverage.

7. Unified Perspective and Practical Implications

Split conformal prediction with unsupervised calibration constitutes a powerful, distribution-free approach to uncertainty quantification, requiring minimal model assumptions and offering tangible guarantees even in semi-supervised, non-IID, or data-limited scenarios. Core advantages include computational efficiency, adaptability to arbitrary predictive models, and robustness under moderate violations of exchangeability or data contamination, which is further enhanced by domain adaptation or optimal transport-based weighting. The framework is applicable to both regression and classification, and adapts to function-valued and structured data. By leveraging empirical quantiles, weighting strategies, or auxiliary objectives (e.g., self-supervised loss), unsupervised calibration mechanisms ensure prediction sets or distributions are reliable and informative for real-world deployment where labeled resources are constrained.

A summary of the principal split conformal prediction with unsupervised calibration methodologies and guarantees is provided in the table below:

Methodology Calibration Data Type Key Guarantee / Feature
Split-conformal with labels Labeled P(coverage)1αP(\text{coverage}) \geq 1-\alpha
Unsupervised/pseudo-label Unlabeled P(coverage)1αβP(\text{coverage}) \geq 1-\alpha-\beta
Weighted/optimized calibration Unlabeled Coverage gap O(model error+O(\text{model error} +complexity))
Self-supervised augmentation Labeled/Unlabeled Improved adaptive efficiency; maintains finite-sample
Cross-conformal methods Labeled/Unlabeled Improved data efficiency, approximate marginal coverage

This comprehensive framework demonstrates the versatility of split conformal prediction with unsupervised calibration, highlighting its theoretical rigor, practical adaptability, and the fundamental role of calibration set construction—be it labeled, weighted, or pseudo-labeled—in reliable, model-agnostic uncertainty quantification.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Split Conformal Prediction with Unsupervised Calibration.