Conformal In-Context Learning (CICLe)

Updated 12 December 2025

CICLe is a framework that combines conformal prediction with large model in-context learning to construct prediction sets with formal, distribution-free coverage guarantees.
It leverages efficient, batched forward passes to compute conformity scores, reducing computational complexity from O(n·K) to O(K) and lowering prompt length by up to 50%.
Applications span regression, classification, and vision tasks, with empirical validations demonstrating robust coverage, competitive predictive performance, and practical efficiency.

Conformal In-Context Learning (CICLe) refers to a family of frameworks that integrate conformal prediction—a statistical methodology for constructing prediction sets with distribution-free coverage guarantees—with the in-context learning capability of large models, including transformers and LLMs. CICLe provides principled uncertainty quantification, enabling construction of prediction intervals or sets with formal guarantees while drastically improving computational and prompt efficiency. Applications include regression, classification, and prompt selection in NLP and vision tasks.

1. Theoretical Foundations and Core Methods

CICLe relies on the exchangeability property, requiring that calibration and inference examples are drawn IID or are permutation-invariant under the joint law. For input–output pairs $\mathcal{D}_{n+1}=\{(x_1, y_1),\dots,(x_{n+1}, y_{n+1})\}$ from $\mathbb{R}^d \times \mathbb{R}$ , conformal prediction yields a set $\Gamma_\alpha(x_{n+1})$ such that

$\mathbb{P}[y_{n+1} \in \Gamma_\alpha(x_{n+1})] \geq 1-\alpha,$

with no assumptions on the distribution beyond exchangeability (Huang et al., 22 Apr 2025).

In the standard regression setting, CICLe constructs conformity scores—typically absolute residuals—by leveraging the in-context prediction of a pre-trained transformer: $R_i(z) = |y_i - \hat y_i(z)| \text{ for } i=1,\dots,n,\quad R_{n+1}(z) = |z - \hat y_{n+1}(z)|.$ A p-value (typicalness) for each candidate $z$ is defined as

$\pi(z) = 1 - \frac{1}{n+1}\left|\{i: R_i(z) \le R_{n+1}(z)\}\right|,$

and the conformal set is $\Gamma_\alpha(x_{n+1}) = \{ z : \pi(z) \geq \alpha \}$ , or equivalently, by a thresholded interval when $z$ is discretized.

For classification, split (inductive) conformal prediction is used: labelwise conformity scores are derived from a base model's predicted probabilities, and for each test input, the conformal set $C(x) = \{y: s(x,y)\ge \hat q_{1-\alpha}\}$ is computed, where $s(x,y)$ denotes a class's base-model probability (Pantelidis et al., 5 Dec 2025, Randl et al., 18 Mar 2024).

2. Algorithmic Integration of Conformal and In-Context Learning

CICLe replaces expensive retraining in full conformal prediction with a single or batched forward pass of the in-context model. For regression:

The context set plus test input are encoded into a token matrix $E(\mathcal{D}_n, x^*, z)$ .
For each $z$ on a discretized grid $Z$ , the pre-trained model predicts outcomes, and all conformity scores are computed in parallel.
The conformal set is the collection of $z$ where the test residual is less than or equal to the $\lceil (n+1)(1-\alpha)\rceil$ -th smallest pooled residual.

In classification, CICLe leverages a base classifier (e.g., TF-IDF logistic regression) to output candidate sets with coverage guarantees, and builds compact in-context prompts for LLMs containing only these candidates. When $|C(x)|=1$ , the LLM is bypassed completely. In vision tasks, jackknife conformal prediction is used to filter training images for in-context selection (Wu et al., 30 Sep 2025).

A distinct approach, E-ICL+FCP, meta-trains a permutation-invariant transformer on a conformal-aware loss to simulate "virtual retrainings" required by full conformal prediction, but at the cost of only $O(K(n+1))$ batched forward passes (Deng et al., 1 Sep 2025). This model yields predictive sets with controlled coverage, smaller average set size, and improved efficiency relative to prior meta-in-context variants.

3. Practical Efficiency and Scaling Laws

CICLe frameworks offer substantial speed and compute advantages:

Classical full conformal prediction retrains the model for each candidate prediction (prohibitively expensive at scale).
CICLe runs a single batched forward pass per candidate $z$ or class $y$ , avoiding retraining.
In split conformal prediction, only one model is trained, but calibration-interval widths are inflated, especially with small calibration sets.

For transformer-based in-context regression, empirical results confirm that both coverage ( $\approx 1-\alpha$ ) and interval width of CP-ICL match oracle ridge-based full-conformal baselines, while reducing evaluation from $O(n\cdot K)$ to $O(K)$ forward passes (Huang et al., 22 Apr 2025).

Scaling laws for CP-ICL predictive intervals adhere to

$L(N, D) = E + A N^{-\alpha} + B D^{-\beta}$

where $N$ is model size, $D$ is pre-training token count, and exponents $(\alpha, \beta)$ quantify diminishing returns of compute and data. The compute-optimal allocation for model/data under a fixed FLOP budget is

$N \propto C^{\beta/(\alpha+\beta)},\quad D \propto C^{\alpha/(\alpha+\beta)},$

with observed $\alpha \approx 0.38,\, \beta \approx 0.62$ (Huang et al., 22 Apr 2025).

In classification, CICLe reduces prompt length and in-context example count (shots) by up to 49.9% and 54.6% (averaged), enabling smaller LLMs to approximate or outperform larger models (Pantelidis et al., 5 Dec 2025). In vision, conformal filtering ensures only statistically reliable prompts are included, reducing noise in global rankings (Wu et al., 30 Sep 2025).

4. Applications in NLP, Vision, and Beyond

CICLe has been applied in several modalities:

Text Classification:

Combining split conformal prediction with in-context LLM prompting for multi-class food risk classification, CICLe constructs adaptive prompts via:

Generating conformal sets $C_a(x)$ for each input.
Selecting $r$ nearest training examples per candidate class from the conformal set for the prompt.
Only querying the LLM when $|C_a(x)|>1$ (Randl et al., 18 Mar 2024, Pantelidis et al., 5 Dec 2025).

On benchmarks (AG News, DBpedia-14, Yahoo Answers, GoEmotions), CICLe surpasses base classifiers in macro-F1, with most pronounced gains on rare and imbalanced classes. At high confidence ( $1-\alpha$ ), LLM usage and prompt lengths are minimized (Pantelidis et al., 5 Dec 2025).

Vision:

For visual in-context learning, such as few-shot segmentation and detection, the RH-Partial2Global method applies jackknife conformal prediction to select a pool of reliable prompt examples before global ranking, outperforming existing similarity-based and random-sampling VICL approaches by margins of 0.62 mIoU in segmentation and 0.28 IoU in detection (Wu et al., 30 Sep 2025).

Regression and Meta-Learning:

CP-ICL has been validated on synthetic linear regression, matching coverage and interval width of oracle ridge regression with substantially better runtime. E-ICL+FCP extends this to permutation-invariant meta-learned settings, achieving tight predictive sets and minimal inference cost (Huang et al., 22 Apr 2025, Deng et al., 1 Sep 2025).

5. Empirical Findings, Limitations, and Recommendations

Empirical studies across modalities reveal:

For regression, CP-ICL and full-conformal ridge regression achieve target coverage rapidly as $n$ increases, while split CP can under-cover for small $n$ (Huang et al., 22 Apr 2025).
Classification experiments show that CICLe achieves full conformal coverage guarantees, and selects compact prediction sets, especially beneficial for high class imbalance (Pantelidis et al., 5 Dec 2025, Randl et al., 18 Mar 2024).
Vision methods integrating jackknife conformal filtering show robust absolute gains over pure similarity ranking (Wu et al., 30 Sep 2025).

Key limitations and recommendations:

CICLe in regression is validated only on linear synthetic data; its extension to non-linear or autoregressive models remains open (Huang et al., 22 Apr 2025, Deng et al., 1 Sep 2025).
The exchangeability condition necessitates symmetric attention mechanisms; applications to causal (autoregressive) transformers require advanced split- or cross-conformal methods.
For very small calibration sets, grid discretization resolution and boundary corrections are crucial for valid coverage (Huang et al., 22 Apr 2025).
In vision, the conservatism of jackknife CP may reduce the pool size for prompt selection, especially in small-data regimes (Wu et al., 30 Sep 2025).
In text, shot selection strategies based solely on TF-IDF may be enhanced by leveraging dense or contrastive embeddings (Pantelidis et al., 5 Dec 2025).

6. Extensions and Future Directions

Several avenues for extension have been identified:

Application of CICLe to structured output tasks, multilabel or hierarchical classification, and cross-lingual scenarios.
Integration of learnable conformity functions (for vision: meta-learned prompt reliability scores).
Use of hierarchical conformal scoring to handle very large label spaces (Deng et al., 1 Sep 2025).
Combining Bayesian or ensemble uncertainties with conformal filtering for tighter and more adaptive prediction sets (Wu et al., 30 Sep 2025).
Investigation of conformity parameter $\alpha$ for finer control of the trade-off between efficiency (prompt length, inference cost) and coverage guarantees.
Extension to real-world nonlinear data and language–vision in-context learning tasks.

7. Summary Table: CICLe Variants and Main Claims

Variant	Domain	Conformal Set Construction	Attention Type	Empirical Efficiency
CP-ICL (Huang et al., 22 Apr 2025)	Regression	Full (interval) via one-pass batch	Symmetric (non-causal)	$O(K)$ fwd passes; full coverage
CICLe (Randl et al., 18 Mar 2024, Pantelidis et al., 5 Dec 2025)	Classification	Split (inductive), base classifier	N/A	>50% prompt reduction; robust F1
E-ICL+FCP (Deng et al., 1 Sep 2025)	Multiclass/Meta	Full, CP-aware trained ICL model	Perm.-invariant	Tightest sets, lowest latency
RH-Partial2Global (Wu et al., 30 Sep 2025)	Vision/VICL	Jackknife (global prompt filter)	Visual foundation model	Up to +0.62 mIoU, improved ranking

CICLe serves as a general-purpose, modular recipe for infusing modern in-context learning architectures with distribution-free uncertainty quantification, offering competitive or superior performance to prior conformal and in-context baselines, with substantial gains in computational and data efficiency.