Adaptive Conformal Inference Overview

Updated 4 July 2026

Adaptive Conformal Inference is an online method that recalibrates prediction sets to meet target miscoverage levels even under arbitrary distribution shifts.
It employs techniques like dynamic step-size tuning, online gradient updates, and parameter-free betting to robustly adjust to nonstationary data.
The approach extends classical exchangeability assumptions by addressing multi-step forecasting, causal inference, and safety-critical runtime assurance.

Searching arXiv for papers on adaptive conformal inference and closely related variants. Searching arXiv for “adaptive conformal inference”. Adaptive Conformal Inference (ACI) is a conformal prediction method for sequential, nonstationary environments. In its canonical form, it replaces the fixed nominal miscoverage level with a single scalar parameter updated online so that prediction errors occur at the target long-run frequency, even when exchangeability fails and the data-generating distribution shifts arbitrarily over time (Gibbs et al., 2021). Subsequent work has expanded this core idea in several directions: dynamic step-size tuning for arbitrary distribution shifts (Gibbs et al., 2022), parameter-free online convex optimization via betting (Podkopaev et al., 2024), multi-step time-series forecasting (Szabadváry, 2024), causal pseudo-outcome calibration under temporal dependence (Koukorinis et al., 29 Jun 2026), and broader adaptive constructions that localize, stratify, or otherwise reshape conformal sets to the structure of the problem (Harris et al., 28 Jul 2025).

1. Canonical online recalibration

The canonical ACI setup is online prediction with observations

$\{(X_t,Y_t)\}_{t\in\mathbb N},$

where, at time $t$ , the learner uses past data and the new covariate $X_t$ to output a prediction set $\hat C_t$ . ACI is a wrapper around any black-box predictive model that yields either a point forecast $\hat\mu_t(X_t)$ or estimated conditional quantiles $\hat q_t(X_t;p)$ . Given a score or nonconformity function $S_t(x,y)$ and a calibration quantile function $\hat Q_t$ , the prediction set is

$\hat C_t(\alpha_t)=\{y:S_t(X_t,y)\le \hat Q_t(1-\alpha_t)\}.$

The adaptive parameter $\alpha_t$ is the current effective miscoverage level fed into the conformal quantile; smaller $t$ 0 yields larger prediction sets, and larger $t$ 1 yields smaller ones (Gibbs et al., 2021).

The update rule is

$t$ 2

with target miscoverage $t$ 3 and step size $t$ 4. If an error occurs, $t$ 5 decreases and the next set widens; if no error occurs, $t$ 6 increases and the next set narrows. The paper introducing ACI also gives a weighted or local-memory variant,

$t$ 7

with weights increasing toward recent times. In that formulation, ACI can use symmetric residual scores, conformalized quantile regression scores, or any other score-based conformal set construction (Gibbs et al., 2021).

A central interpretive device is the latent optimal level

$t$ 8

where

$t$ 9

ACI can be understood as tracking this moving target $X_t$ 0. This perspective becomes especially important once later work reinterprets the update as online gradient descent on the pinball loss and then modifies the optimization mechanism rather than the conformal set itself.

2. Validity targets and the limits of “coverage”

ACI changes the validity notion. Classical split conformal inference under exchangeability aims at finite-sample marginal coverage of the form

$X_t$ 1

ACI instead targets long-run empirical calibration: $X_t$ 2 The basic finite-time bound is

$X_t$ 3

hence

$X_t$ 4

This is assumption-free with respect to the data-generating process, but it is a time-averaged pathwise guarantee rather than exact one-step marginal validity (Gibbs et al., 2021).

This distinction persists in later extensions. In multi-step online forecasting, the natural guarantee is horizon-wise and overall time-averaged control of miss rates, because feedback is delayed and each horizon has its own adaptive significance level. The multi-step ACI extension obtains finite-sample, almost-sure, time-averaged coverage-frequency guarantees for each horizon and for the average error across horizons, but it is still not a classical exchangeable finite-sample guarantee for each forecasted value (Szabadváry, 2024).

The same caution applies more strongly in causal and latent-state variants. Doubly Robust Adaptive Conformal Inference calibrates on the observable doubly robust pseudo-outcome $X_t$ 5, not on the latent CATE $X_t$ 6 directly. Its headline theorem is a time-averaged or long-run validity statement with an explicit finite- $X_t$ 7 coverage gap bound for pseudo-outcomes; it does not claim finite-sample $X_t$ 8-coverage for the latent CATE (Koukorinis et al., 29 Jun 2026). Likewise, the hidden-Markov particle-filtering variant targets aggregated posterior particle weight inside the set rather than direct finite-sample coverage of the hidden state itself (Su et al., 2024).

A common misconception is therefore that “adaptive” simply means “exchangeability-free conformal prediction with the same guarantees.” The literature is explicit that what is usually retained under non-exchangeability is long-run calibration, local interval regret, aggregated posterior-mass control, or asymptotically conservative containment, depending on the problem. Exact conditional coverage is generally not the object being proved.

3. Optimization viewpoint, step-size sensitivity, and game-theoretic reformulations

A major line of work reinterprets ACI as an online optimization method. In the arbitrary-shift setting, ACI can be written as online gradient descent on the pinball loss. With

$X_t$ 9

the update

$\hat C_t$ 0

is equivalently an online gradient step on a pinball-loss objective. This exposes the main weakness of canonical ACI: performance depends strongly on the fixed step size $\hat C_t$ 1 (Gibbs et al., 2022).

“Conformal Inference for Online Prediction with Arbitrary Distribution Shifts” introduces dynamically-tuned adaptive conformal inference (DtACI), which runs many ACI instances in parallel, each with a different candidate step size, and then adaptively aggregates them using an online expert scheme with a share mechanism. The resulting method is designed to be adaptive to both the size and type of the distribution shift, and its theory is phrased through dynamic regret on all local time intervals of a given width. This local perspective is central: global average coverage can be misleading in a drifting environment, whereas small regret on every local interval makes the procedure responsive to abrupt shifts and stable under slow drift (Gibbs et al., 2022).

“Adaptive Conformal Inference by Betting” replaces learning-rate-sensitive gradient updates with parameter-free betting or online convex optimization. For symmetric intervals

$\hat C_t$ 2

the minimal covering radius $\hat C_t$ 3 induces the pinball-loss subgradient

$\hat C_t$ 4

Instead of choosing a learning rate, the method updates a bettor’s wealth and betting fraction, producing a radius sequence with sublinear regret and asymptotic long-run miscoverage control under bounded nonconformity scores. The target criterion remains

$\hat C_t$ 5

but the algorithm removes manual step-size tuning (Podkopaev et al., 2024).

A more structural reformulation appears in “Blackwell’s Approachability for Sequential Conformal Inference.” There, ACI is cast as a repeated vector-valued finite game with payoff

$\hat C_t$ 6

whose two coordinates are miscoverage and interval length. This yields a precise characterization of attainable coverage-efficiency tradeoffs. The resulting BOACI algorithm combines calibrated forecasting of the opponent’s next action with an oracle-valid best response. In the fully adversarial setting, the paper proves that meaningful efficiency guarantees are impossible without further restrictions; under statistical $\hat C_t$ 7-restrictions on the opponent, BOACI approaches the corresponding target set $\hat C_t$ 8 and can recover classical exchangeable efficiency asymptotically (Principato et al., 17 Oct 2025).

4. Dependence, causality, and delayed feedback

Time dependence has produced several specialized ACI variants. In online multi-step-ahead forecasting, one predicts a vector

$\hat C_t$ 9

and receives feedback with delay. The multi-step extension maintains a separate adaptive significance level for each horizon,

$\hat\mu_t(X_t)$ 0

updated by

$\hat\mu_t(X_t)$ 1

where the error vector is assembled from the diagonals of stored lower and upper prediction-bound matrices. This preserves ACI-style finite-sample, almost-sure, time-averaged guarantees for each horizon and for the overall average error rate (Szabadváry, 2024).

In causal inference under temporal dependence, DR-ACI combines three ingredients that are usually studied separately: doubly robust pseudo-outcomes, adaptive conformal recalibration, and time-series dependence control under $\hat\mu_t(X_t)$ 2-mixing. The pseudo-outcome

$\hat\mu_t(X_t)$ 3

is observable and centered on $\hat\mu_t(X_t)$ 4 under correct nuisance specification. Conformal scores are

$\hat\mu_t(X_t)$ 5

and online calibration uses the ACI recursion

$\hat\mu_t(X_t)$ 6

The theory gives a coverage-gap decomposition with a mixing gap, a nuisance-bias tax, and an $\hat\mu_t(X_t)$ 7 adaptation term. The guarantee is for pseudo-outcomes, not finite-sample exact coverage of the latent CATE (Koukorinis et al., 29 Jun 2026).

For latent-state inference in hidden Markov models, the key obstacle is the absence of observable hidden-state labels for calibration. “Adaptive Conformal Inference by Particle Filtering under Hidden Markov Models” replaces the unavailable hidden state by a weighted particle approximation of the filtering posterior. Coverage is redefined as the total posterior particle weight inside the set, and realized aggregated miscoverage is

$\hat\mu_t(X_t)$ 8

The adaptive update becomes

$\hat\mu_t(X_t)$ 9

with an analogous horizon-specific version for multi-step prediction. The validity target is long-run average aggregated coverage, not direct latent-state coverage in the classical conformal sense (Su et al., 2024).

5. Broader forms of adaptivity in conformal inference

This suggests that current usage of “adaptive conformal inference” is broader than the original online threshold-update paradigm. In the recent literature, adaptivity can target groups, local neighborhoods, score geometry, noisy labels, or structured outputs, while keeping exchangeability-based calibration intact.

Method	What adapts	Validity statement
AFCP	Selected sensitive attribute $\hat q_t(X_t;p)$ 0	Finite-sample adaptive equalized coverage under exchangeability (Zhou et al., 2024)
LSCI	Localization weights and projection-depth score	Approximate finite-sample marginal coverage under local exchangeability (Harris et al., 28 Jul 2025)
CCLE ellipsoids	Covariance-aware multivariate conformity score	Standard split-conformal finite-sample marginal coverage with ellipsoids (Henderson et al., 2024)
Transductive adaptive scores	Score may use calibration+test covariates permutation-invariantly	DKW-type concentration for the empirical distribution of conformal $\hat q_t(X_t;p)$ 1-values (Gazin et al., 2023)
Noise-adaptive classification	Calibration correction $\hat q_t(X_t;p)$ 2 for noisy labels	Finite-sample marginal coverage for the clean label under known invertible transition matrix (Bortolotti et al., 29 Jan 2025)
MACI	Group-conditional calibration for claim filtering	Finite-sample group-conditional coverage that retained claims are factual (Noh et al., 1 Feb 2026)

“Conformal Classification with Equalized Coverage for Adaptively Selected Groups” is representative of group-adaptive conformalization. AFCP selects a sensitive attribute using leave-one-out pseudo-miscoverage indicators, then returns a union of marginal and selected-group conformal sets. The guarantee is finite-sample, distribution-free adaptive equalized coverage conditional on the adaptively selected attribute, rather than protection for all sensitive groups simultaneously (Zhou et al., 2024).

“Locally Adaptive Conformal Inference for Operator Models” generalizes adaptive conformal ideas to function-valued outputs. LSCI uses a localizer $\hat q_t(X_t;p)$ 3, weighted empirical projected residual laws, and $\hat q_t(X_t;p)$ 4-depth scores to form a test-input-dependent residual central region in function space. The resulting function-valued set has an approximate finite-sample marginal coverage guarantee under local exchangeability, with the coverage gap controlled by weighted local dissimilarity (Harris et al., 28 Jul 2025).

Two other directions adapt to imperfections in the calibration mechanism itself. “Transductive conformal inference with adaptive scores” allows arbitrary exchangeable scores, including scores that use the calibration and test covariates at training stage as long as the construction is permutation-invariant. Its key result is that the full vector of transductive conformal $\hat q_t(X_t;p)$ 5-values has a universal Pólya-urn law, enabling DKW-type uniform concentration for the empirical distribution of $\hat q_t(X_t;p)$ 6-values (Gazin et al., 2023). “Noise-Adaptive Conformal Classification with Marginal Coverage” instead adapts to random label noise: it estimates the discrepancy

$\hat q_t(X_t;p)$ 7

between clean-label and noisy-label score distributions and corrects the calibration rule accordingly, obtaining finite-sample marginal coverage for the clean label under a class-conditional label-noise model with known invertible transition matrix (Bortolotti et al., 29 Jan 2025).

A further specialization appears in LLM factuality filtering. MACI does not output a prediction set over labels; it filters an LLM response so that every retained claim is factual with high probability. Its adaptive component is group-conditional calibration combined with a multiplicative conformity score over ordered claim-level factuality scores, and it guarantees group-conditional validity of the event that the retained claim set is contained in the true-claim set (Noh et al., 1 Feb 2026).

6. Runtime assurance, safety, and cost-aware adaptation

Safety-critical applications have driven another reinterpretation of ACI: the feedback signal need not be a bare miss indicator. “Cost-Aware Adaptive Conformal Inference for Runtime Assurance in Dynamic Environments” introduces the boosted loss

$\hat q_t(X_t;p)$ 8

where $\hat q_t(X_t;p)$ 9 is the miscoverage indicator and $S_t(x,y)$ 0 is a normalized violation cost. The update becomes

$S_t(x,y)$ 1

When $S_t(x,y)$ 2, the method reduces to standard ACI; for $S_t(x,y)$ 3, severe failures trigger larger negative steps than benign failures. The main theoretical consequence is a dual long-run guarantee: $S_t(x,y)$ 4 where $S_t(x,y)$ 5 is long-run violation frequency and $S_t(x,y)$ 6 is long-run average cumulative harm. In that sense, the method extends ACI from controlling only how often failures occur to controlling how harmful they are (Wu et al., 23 May 2026).

A related but distinct control application is Adaptive Conformal Filtering (ACoFi), which combines a learned Hamilton–Jacobi reachability-based safety filter with ACI. Here the one-sided nonconformity score is

$S_t(x,y)$ 7

so only optimistic safety errors are penalized. The adaptive quantile $S_t(x,y)$ 8 induces a lower bound

$S_t(x,y)$ 9

on the next learned safety value, and the controller switches from the nominal policy to the learned safe policy whenever this conformally corrected lower bound is too small. The guarantee is explicitly a soft safety guarantee rather than a hard safety guarantee: the long-run rate of incorrectly quantifying uncertainty in the predicted safety of the nominal policy is asymptotically upper bounded by a user-defined parameter, but invariance in the true system is not claimed (Huriot et al., 20 Apr 2026).

These control papers broaden the operational meaning of “coverage.” In them, conformal adaptation regulates safety margins, switching criteria, or risk budgets inside a closed-loop controller. A plausible implication is that ACI is becoming a general online calibration primitive for sequential decision systems, not only a post hoc uncertainty wrapper.

7. Limitations, controversies, and open directions

The literature is unusually explicit about its own limitations. The first is the persistent gap between long-run validity and stronger finite-sample or conditional guarantees. Canonical ACI, multi-step ACI, betting-based ACI, and particle-filter ACI all target time-averaged error control rather than exact per-round coverage; this is a feature of the problem formulation, not merely a proof artifact (Gibbs et al., 2021).

The second is step-size sensitivity and related tuning burdens. Canonical ACI needs $\hat Q_t$ 0, and its performance depends on the unknown speed of environmental change. DtACI and betting-based ACI exist precisely because a fixed learning rate can be too sluggish after abrupt shifts and too noisy under slow drift (Gibbs et al., 2022). Even so, the replacement hyperparameters in later methods do not eliminate the general calibration-versus-reactivity tradeoff.

The third is that empirical robustness often extends beyond current theory. In DR-ACI, the strongest robustness study combines dependence and drift, but drift lies outside the theorem’s strict stationarity and $\hat Q_t$ 1-mixing assumptions. The paper is clear that this is a robustness experiment rather than a theorem-covered case (Koukorinis et al., 29 Jun 2026). Conversely, BOACI shows that without additional structure, adversarial validity and nontrivial efficiency cannot be achieved simultaneously. That impossibility result places a hard limit on what one should expect from sequential conformal methods in unrestricted environments (Principato et al., 17 Oct 2025).

The fourth is computational. BOACI requires a calibrated forecaster over the simplex $\hat Q_t$ 2, which the paper identifies as a deployment bottleneck (Principato et al., 17 Oct 2025). LSCI requires test-time sampling from implicit function-space regions (Harris et al., 28 Jul 2025). Groupwise and fairness-aware methods can become sample-inefficient when too many subgroups are protected at once (Zhou et al., 2024). Particle-filter and control variants inherit the computational cost of the underlying state-estimation or MPC machinery (Su et al., 2024).

Finally, several problem classes remain open or only partially addressed. DR-ACI notes that valid conformal prediction under long-range dependence remains open (Koukorinis et al., 29 Jun 2026). Noise-adaptive conformal classification proves finite-sample validity only when the label-noise transition matrix is known, whereas real-data experiments necessarily use plug-in estimates (Bortolotti et al., 29 Jan 2025). Group-conditional LLM filtering requires sensible group definitions and enough calibration data per group (Noh et al., 1 Feb 2026). More broadly, the literature repeatedly distinguishes marginal, group-conditional, local, posterior-mass, and pseudo-outcome validity. That plurality of targets is not terminological drift alone; it reflects the fact that “adaptive conformal inference” now names a family of methods whose guarantees are tailored to different non-exchangeable or structured regimes rather than a single universal replacement for classical conformal prediction.