Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bellman Conformal Inference (BCI)

Updated 8 February 2026
  • Bellman Conformal Inference (BCI) is a framework that generates calibrated predictive intervals for univariate time series by leveraging dynamic programming to balance interval length and long-term coverage.
  • It formulates a one-dimensional stochastic control problem to optimally select interval parameters, acting as a robust wrapper around black-box forecasting models.
  • Empirical evaluations show that BCI achieves rigorous non-asymptotic coverage guarantees and produces intervals up to 20% shorter than those from Adaptive Conformal Inference.

Bellman Conformal Inference (BCI) is a framework for producing calibrated predictive intervals for univariate time series by leveraging dynamic programming to minimize average interval length while maintaining rigorous long-term coverage guarantees. BCI operates as a wrapper around arbitrary black-box multi-step forecasting models, directly addressing the potential miscalibration of nominal prediction intervals provided by such models. At each step, BCI formulates and solves a tractable one-dimensional stochastic control problem to select interval parameters, delivering approximately calibrated intervals under arbitrary distribution shifts and temporal dependencies and yielding tighter prediction intervals compared to previous methods such as Adaptive Conformal Inference (ACI) (Yang et al., 2024).

1. Problem Formulation and Calibration Objective

Consider a univariate time series (Y1,Y2,)(Y_1, Y_2, \dots) where YtYY_t \in \mathcal Y is revealed only at time t+1t+1, and let Ft1\mathcal F_{t-1} represent all observable information up to time tt. At each time tt, a black-box forecaster provides, for each horizon s{t,,t+T1}s \in \{t,\dots, t+T-1\}, a nominal (1β)(1-\beta)-level prediction interval

Cst(1β)Y,β[0,1],C_{s|t}(1-\beta) \subseteq \mathcal Y, \qquad \beta\in[0,1],

with interval length Lst(β)=Cst(1β)L_{s|t}(\beta) = |C_{s|t}(1-\beta)|. While ideally P(YsCst(1β))=1βP(Y_s \in C_{s|t}(1-\beta)) = 1-\beta, in practice these prediction intervals may be poorly calibrated.

BCI leverages a data-dependent miscoverage index αt[0,1]\alpha_t \in [0,1], adapted to Ft1\mathcal F_{t-1}, to determine the prediction interval Ct:=Ctt(1αt)C_t := C_{t|t}(1-\alpha_t). Defining the indicator errt=1{YtCt}\mathrm{err}_t = \mathbf{1}\{Y_t \notin C_t\}, the calibration objective is strict long-run validity for a pre-specified target αˉ(0,1)\bar\alpha \in (0,1):

lim supK1Kt=1Kerrtαˉalmost surely,\limsup_{K\to\infty} \frac{1}{K}\sum_{t=1}^K \mathrm{err}_t \le \bar\alpha \quad \text{almost surely},

uniformly for any data-generating process, including adversarial or deterministic sequences. The only assumptions are that Cst(1β)C_{s|t}(1-\beta) is monotonic in β\beta (set inclusion) and Cst(1)=YC_{s|t}(1) = \mathcal Y is the trivial full-space interval.

2. Stochastic Control Problem and Dynamic Programming

BCI addresses interval selection as a finite-horizon one-dimensional stochastic control problem (SCP) at each time tt. The objective is to choose {αst}s=tt+T1\{\alpha_{s|t}\}_{s=t}^{t+T-1} to minimize expected total interval length, plus a penalty on excess miscoverage: minαtt,,αt+T1tE[s=tt+T1Lst(αst)+λtmax(1Ts=tt+T1errstαˉ,0)].\min_{\alpha_{t|t},\dots,\alpha_{t+T-1|t}} \mathbb{E} \Bigg[\sum_{s=t}^{t+T-1} L_{s|t}(\alpha_{s|t}) + \lambda_t \max\left(\frac{1}{T}\sum_{s=t}^{t+T-1}\mathrm{err}_{s|t} - \bar\alpha,\, 0\right)\Bigg]. Here, for each ss, errst=1{αst>βst}\mathrm{err}_{s|t} = \mathbf{1}\{\alpha_{s|t} > \beta_{s|t}\}, with βst\beta_{s|t} drawn from the analyst's empirical estimate FstF_{s|t} of the future probability integral transform (PIT). The scalar weight λt\lambda_t controls the tradeoff between short intervals and coverage. No constraints are required beyond ensuring αst[0,1]\alpha_{s|t} \in [0,1]; the safeguard Cst(1)=YC_{s|t}(1) = \mathcal Y ensures that if λt0\lambda_t\le0, the trivial solution (maximal interval) is always achievable.

The SCP is solved via dynamic programming on the state ρst=k=ts1errkt\rho_{s|t} = \sum_{k=t}^{s-1}\mathrm{err}_{k|t} (number of miscoverages up to s1s-1), with terminal cost at s=t+Ts=t+T: Jt+Tt(ρ)=λtmax(ρTαˉ,0).J_{t+T|t}(\rho) = \lambda_t \max\left(\frac{\rho}{T} - \bar\alpha,\, 0\right). The Bellman update admits explicit computation: Jst(ρ)=minα[0,1]EβFst[Lst(α)+Js+1t(ρ+1{α>β})].J_{s|t}(\rho) = \min_{\alpha\in[0,1]} \mathbb{E}_{\beta\sim F_{s|t}} \left[ L_{s|t}(\alpha) + J_{s+1|t} \left( \rho + \mathbf{1}\{\alpha > \beta\} \right)\right]. With Dst(ρ)=Js+1t(ρ+1)Js+1t(ρ)D_{s|t}(\rho) = J_{s+1|t}(\rho+1) - J_{s+1|t}(\rho), the optimal action at (s,ρ)(s,\rho) is

α~st(ρ)=argminα[0,1]{Lst(α)+Dst(ρ)Fst(α)}.\widetilde\alpha_{s|t}(\rho) = \arg\min_{\alpha\in[0,1]} \big\{ L_{s|t}(\alpha) + D_{s|t}(\rho) F_{s|t}(\alpha) \big\}.

The actual action at time tt is αtt=α~tt(0)\alpha^*_{t|t} = \widetilde\alpha_{t|t}(0).

3. Interval Construction and Online Updates

Once αt\alpha_t is determined, the prediction interval for YtY_t is [Lt,Ut]=Ctt(1αt)[L_t, U_t] = C_{t|t}(1-\alpha_t). The "uncalibrated PIT" at time tt is

βt=sup{β:YtCtt(1β)}.\beta_t = \sup\{\beta: Y_t \in C_{t|t}(1-\beta)\}.

Then miscoverage is encoded as errt=1{αt>βt}\mathrm{err}_t = \mathbf{1}\{\alpha_t > \beta_t\}. The update for λt\lambda_t is performed via an online-gradient step,

λt+1=λtγ[αˉerrt],γ=cλmax,c(0,1),\lambda_{t+1} = \lambda_t - \gamma [\bar\alpha - \mathrm{err}_t], \qquad \gamma = c\,\lambda_{\max},\, c\in(0,1),

and whenever λt>λmax\lambda_t > \lambda_{\max}, BCI defaults to the full-space interval by truncating αt\alpha_t to zero. This update ensures that

λt[γαˉ,λmax+γ(1αˉ)]\lambda_t \in [-\gamma \bar\alpha,\, \lambda_{\max} + \gamma(1-\bar\alpha)]

via induction, so that long-run miscoverage is controlled.

4. Algorithmic Workflow

BCI can be summarized in a stepwise form as follows:

  1. Input: Previous λt1\lambda_{t-1}, errt1\mathrm{err}_{t-1}; multi-step forecasts {Lst(),Fst}s=tt+T1\{L_{s|t}(\cdot), F_{s|t}\}_{s=t}^{t+T-1};
  2. Update the security parameter:

λt=λt1γ[αˉerrt1]\lambda_t = \lambda_{t-1} - \gamma [\bar\alpha - \mathrm{err}_{t-1}]

  1. Solve the stochastic control problem via dynamic programming to obtain α~st(ρ)\widetilde\alpha_{s|t}(\rho) for all ss, ρ\rho;
  2. Set

αt={α~tt(0),λtλmax 0,λt>λmax\alpha_t = \begin{cases} \widetilde\alpha_{t|t}(0), & \lambda_t \le \lambda_{\max} \ 0, & \lambda_t > \lambda_{\max} \end{cases}

and output Ct(1αt)C_t(1-\alpha_t);

  1. Observe YtY_t, record errt=1{YtCt}\mathrm{err}_t = \mathbf{1}\{Y_t \notin C_t\}, and repeat.

This workflow requires only future-looking forecasts (empirical PITs and nominal interval lengths), compatible with any off-the-shelf forecasting mechanism.

5. Coverage Properties and Theoretical Guarantees

BCI establishes a non-asymptotic bound for average miscoverage. For any starting index m0m \ge 0 and batch of KK rounds,

1Kt=m+1m+Kerrtαˉλmax+γKγ=c+1cK.\left|\frac{1}{K}\sum_{t=m+1}^{m+K} \mathrm{err}_t - \bar\alpha\right| \le \frac{\lambda_{\max} + \gamma}{K\gamma} = \frac{c + 1}{c K}.

This bound guarantees, by sending KK\to\infty, that

lim supK1Kerrtαˉ\limsup_{K\to\infty} \frac{1}{K}\sum \mathrm{err}_t \le \bar\alpha

almost surely, for any data sequence, regardless of stochasticity or stationarity. The approach does not impose assumptions on the forecaster or underlying process.

6. Empirical Evaluation and Comparisons

Empirical assessments utilize datasets including daily logarithmic returns for stocks (e.g., AMD, Amazon, Nvidia), squared return volatility, and Google Trends queries (e.g., "deep learning"). Forecasters include a small transformer for returns, GARCH(1,1) for volatility, and a 5-layer LSTM for Google Trends, each producing nominal intervals by Gaussian quantiles.

The benchmark is Adaptive Conformal Inference (ACI), which recursively updates αt\alpha_t using

αt=αt1+γ[αˉerrt1]\alpha_t = \alpha_{t-1} + \gamma [\bar\alpha - \mathrm{err}_{t-1}]

for set step-sizes. Metrics comprise local 500-point moving averages of miscoverage and interval length: LocalMiscovt=1500s=t250t+250errs,LocalLengtht=1500s=t250t+250Cs(1αs)\mathrm{LocalMiscov}_t = \frac{1}{500}\sum_{s=t-250}^{t+250} \mathrm{err}_s, \qquad \mathrm{LocalLength}_t = \frac{1}{500}\sum_{s=t-250}^{t+250} |C_s(1-\alpha_s)| and the proportion of intervals of infinite length (signaling uninformative coverage).

Findings from the data include:

  • BCI and ACI both achieve near-target 10% miscoverage.
  • BCI yields consistently shorter average intervals (e.g., return series: 0.08 vs. 0.09) and avoids uninformative, infinite-length intervals observed with ACI under heavy distribution shifts or loose control.
  • Even when forecaster intervals are well-calibrated (e.g., GARCH on volatility), BCI matches ACI in coverage and interval length while robustly avoiding infinite intervals.
  • Largest benefits occur when nominal forecaster intervals are poorly calibrated (e.g., LSTM on Google Trends), with BCI reducing average interval widths by approximately 20% at the same coverage level (Yang et al., 2024).

7. Interpretation, Scope, and Relation to Existing Methods

BCI generalizes conformal inference for time series by incorporating dynamic programming and explicit multi-step prediction. Unlike ACI, which updates coverage controllers myopically, BCI reasons over a finite prediction horizon via stochastic control, optimizing the length-vs-coverage tradeoff. The methodology guarantees long-run frequentist coverage under arbitrary nonstationarity, adversarial distribution shifts, and even in the face of poor model calibration, while producing substantially tighter and more informative intervals. A plausible implication is that BCI represents a robust wrapper for any black-box forecasting pipeline, offering rigorous guarantees without assumptions on model correctness or data structure (Yang et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bellman Conformal Inference (BCI).