Papers
Topics
Authors
Recent
Search
2000 character limit reached

Predictive inference for time series: why is split conformal effective despite temporal dependence?

Published 2 Oct 2025 in stat.ML, cs.LG, math.ST, and stat.TH | (2510.02471v1)

Abstract: We consider the problem of uncertainty quantification for prediction in a time series: if we use past data to forecast the next time point, can we provide valid prediction intervals around our forecasts? To avoid placing distributional assumptions on the data, in recent years the conformal prediction method has been a popular approach for predictive inference, since it provides distribution-free coverage for any iid or exchangeable data distribution. However, in the time series setting, the strong empirical performance of conformal prediction methods is not well understood, since even short-range temporal dependence is a strong violation of the exchangeability assumption. Using predictors with "memory" -- i.e., predictors that utilize past observations, such as autoregressive models -- further exacerbates this problem. In this work, we examine the theoretical properties of split conformal prediction in the time series setting, including the case where predictors may have memory. Our results bound the loss of coverage of these methods in terms of a new "switch coefficient", measuring the extent to which temporal dependence within the time series creates violations of exchangeability. Our characterization of the coverage probability is sharp over the class of stationary, $\beta$-mixing processes. Along the way, we introduce tools that may prove useful in analyzing other predictive inference methods for dependent data.

Summary

  • The paper introduces the switch coefficient to quantify temporal dependence and explains its role in maintaining prediction interval coverage.
  • It establishes sharp lower and upper bounds for stationary β-mixing processes, highlighting a linear decay in coverage loss with sample size.
  • The framework is applicable to predictors with memory and black-box ML models, offering practical insights for uncertainty quantification in time series forecasting.

Predictive Inference for Time Series: Split Conformal Prediction under Temporal Dependence

Introduction

This paper addresses the theoretical underpinnings of split conformal prediction for time series data, focusing on the challenge posed by temporal dependence. Conformal prediction is widely used for uncertainty quantification in predictive modeling due to its distribution-free coverage guarantees under exchangeability. However, time series data inherently violate exchangeability due to temporal dependencies, especially when predictors utilize historical observations ("memory"). Despite this, split conformal prediction often performs well empirically in time series contexts. The paper provides a rigorous explanation for this phenomenon, introducing the "1" as a measure of deviation from exchangeability and establishing sharp coverage bounds for stationary, β\beta-mixing processes.

Problem Formulation and Conformal Prediction in Time Series

The predictive inference problem is formalized for a time series Z=(Z1,,Zn+1)Z = (Z_1, \ldots, Z_{n+1}), where each Zi=(Xi,Yi)Z_i = (X_i, Y_i) consists of covariates and responses. The goal is to construct a prediction interval for Yn+1Y_{n+1} using a predictive model f^\widehat{f} and the observed data (Xi,Yi)i=1n(X_i, Y_i)_{i=1}^n. Split conformal prediction constructs prediction sets based on the quantiles of conformity scores, which are typically residuals s(z)=yf^(x)s(z) = |y - \widehat{f}(x)|. The method is agnostic to the underlying predictive model and only requires exchangeability for its theoretical guarantees.

In practice, predictors often have memory, i.e., they depend on the previous LL observations. This complicates the analysis, as the conformity scores are no longer independent of the calibration data, and the temporal dependence can induce strong violations of exchangeability.

The Switch Coefficient: Quantifying Temporal Dependence

The paper introduces the switch coefficient Ψk,τ(Z)\Psi_{k,\tau}(Z), defined as the total variation distance between two specific subvectors of the time series obtained by deleting blocks of entries. The averaged switch coefficient Ψˉτ(Z)\bar\Psi_\tau(Z) quantifies the overall deviation from exchangeability for a given lag τ\tau. For stationary β\beta-mixing processes, the switch coefficient is shown to be bounded by the mixing coefficient, i.e., Ψk,τ(Z)2β(τ)\Psi_{k,\tau}(Z) \leq 2\beta(\tau) for knτk \leq n-\tau.

This framework allows the authors to relate the coverage properties of conformal prediction directly to the temporal dependence structure of the data, rather than relying on exchangeability or independence.

Main Theoretical Results

Coverage Guarantees

The central result is a lower bound on the coverage probability of split conformal prediction intervals in terms of the switch coefficient:

P{Yn+1C(Xn+1;Zn,,ZnL+1)}1αminτ{τnL+1+Ψˉτ(S)}P\{Y_{n+1} \in C(X_{n+1}; Z_n, \ldots, Z_{n-L+1})\} \geq 1 - \alpha - \min_{\tau} \left\{ \frac{\tau}{n-L+1} + \bar\Psi_\tau(S) \right\}

where SS is the vector of conformity scores, and LL is the memory length of the predictor. For stationary β\beta-mixing processes, this becomes:

P{Yn+1C(Xn+1;Zn,,ZnL+1)}1αminτ{τ+LnL+1+2β(τ)}P\{Y_{n+1} \in C(X_{n+1}; Z_n, \ldots, Z_{n-L+1})\} \geq 1 - \alpha - \min_{\tau} \left\{ \frac{\tau+L}{n-L+1} + 2\beta(\tau) \right\}

This result is sharp, as demonstrated by a matching lower bound up to a universal constant. The coverage loss decays linearly with $1/n$ and increases linearly with the mixing time, providing a more precise characterization than previous results, which only established sublinear rates.

Split Conformal with Data-Dependent Scores

For split conformal prediction where the score function is trained on a subset of the data, the coverage guarantee is extended to account for the dependence between the score function and the calibration data. The bound involves deleting initial calibration scores to mitigate this dependence, and the coverage loss is again controlled by the switch coefficient and the β\beta-mixing coefficients.

Overcoverage Analysis

The paper also establishes an upper bound on the coverage probability, showing that the conformal prediction set is not overly conservative. The switch coefficient provides both lower and upper bounds, ensuring that the prediction intervals are neither too narrow nor too wide.

Implications and Comparison to Prior Work

The results explain the empirical effectiveness of split conformal prediction in time series settings, even when temporal dependence is present. The switch coefficient provides a unified framework for quantifying deviations from exchangeability and can be used to analyze other predictive inference methods for dependent data. The bounds are tighter than those obtained via blocking and empirical process techniques, which typically yield sublinear rates in nn.

The analysis accommodates predictors with arbitrary memory and does not require consistency of the predictive model, making it applicable to black-box ML models commonly used in practice. The theoretical guarantees are robust to strong short-range dependence, provided the mixing coefficients decay sufficiently with lag.

Future Directions

The switch coefficient is a natural object for studying stochastic processes and may have applications beyond conformal prediction, such as in online learning and statistical inference for dependent data. The proof techniques, which exploit the stability of quantile functions under addition and deletion of scores, could lead to sharper analyses of other uncertainty quantification methods in dynamic settings.

Conclusion

This work provides a rigorous theoretical foundation for the use of split conformal prediction in time series analysis, demonstrating that coverage guarantees can be maintained under temporal dependence by quantifying deviations from exchangeability via the switch coefficient. The results are sharp and broadly applicable, offering practical guidance for uncertainty quantification in time series forecasting with black-box models. The framework and techniques introduced have potential for further development in the analysis of predictive inference under dependence.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 19 likes about this paper.