Confidence Sequences: Time-Uniform Inference
- Confidence sequences are sequences of confidence intervals that provide time-uniform, finite-sample coverage for a parameter across all sample sizes.
- They are constructed using martingale and supermartingale techniques, incorporating exponential bounds, mixture methods, and stitching processes.
- Their robust coverage properties empower sequential inference in modern applications such as A/B testing, bandit problems, and risk-limiting audits.
A confidence sequence (CS) is a sequence of confidence intervals or sets that provide time-uniform, finite-sample, and nonasymptotic coverage for a parameter of interest—such as a mean, quantile, or regression coefficient—across all possible sample sizes. For a target parameter , a CS satisfies , guaranteeing the prescribed coverage under arbitrary, even data-dependent, stopping rules. Confidence sequences unify concepts from classical confidence intervals, the law of the iterated logarithm, sequential probability ratio testing, and martingale theory, and are now central to robust, sequential inference in modern applications such as A/B testing, bandits, off-policy evaluation, risk-limiting audits, and robust anomaly detection.
1. Probabilistic Principle: Time-Uniform, Anytime-Valid Inference
A confidence sequence provides time-uniform frequentist error control. Formally, for a stream of data , a confidence sequence for parameter (e.g., a mean or quantile) is a sequence of sets where
This holds over all potential stopping rules, including fully adaptive, data-dependent monitoring ("anytime-validity") (Maharaj et al., 2023, Howard et al., 2018). These guarantees are enabled by constructing nonnegative supermartingales (processes whose expected future value, conditional on the past, never increases) and leveraging Ville's maximal inequality: for any nonnegative supermartingale with 0 (Howard et al., 2018). Inverting the event 1 yields the set-valued process 2.
2. Construction Methodologies for Confidence Sequences
Multiple, highly general construction methodologies for confidence sequences have been developed, all based on martingale or supermartingale constructions. The most important methods, which apply in various parametric and nonparametric settings, include:
- Exponential Supermartingale Approach (Sub-3): Given a convex function 4 (often a cumulant generating function), construct 5 as a supermartingale. Here 6 is a (centered) martingale sum and 7 an adapted variance proxy. The confidence boundary 8 is then solved so that 9 (Howard et al., 2018, Wang et al., 2022).
- Mixture Martingale and Bayesian Prior-Posterior Ratio Techniques: Using conjugate mixtures or the prior-to-posterior ratio martingale yields CSs for a general class of models, including sampling without replacement (Waudby-Smith et al., 2020) and arbitrary exponential families (2002.03658, Cortinovis et al., 28 Jun 2025, Waudby-Smith et al., 2020). This yields closed-form or readily computable boundaries in, e.g., Gaussian, Bernoulli, or hypergeometric models.
- Stitching and Peeling Techniques: Divide intrinsic time 0 into epochs; construct piecewise-linear or curved (finite-LIL rate) boundaries by union-bounding over epochs (Howard et al., 2018). This is critical for achieving minimax (iterated-logarithm) shrinkage rates.
- Gambling and Wealth-Process Frameworks: Vielled through the coin-toss/two-horse-race setup for univariate means and extended to categorical, multivariate, and probability-vector means using universal portfolio and mixture-Dirichlet weights (Ryu et al., 2024, Ryu et al., 2022).
- Robust and Heavy-Tail Approaches: Catoni-style and robust exponential supermartingales extend CSs to finite 1-th moment data (Wang et al., 2022, Bhatt et al., 2022, Wang et al., 2023).
- Generalized Linear Models via Online-to-Confidence-Set Reductions: Low online log-loss regret in sequential prediction is shown to imply time-uniform confidence sets for GLM parameters (Clerico et al., 23 Apr 2025).
3. Coverage Properties, Rates, and Optimality
Confidence sequences achieve non-asymptotic, exact or sharp asymptotic time-uniform coverage. Their widths decay with 2 at minimax rates under mild assumptions:
| Regime / Model | Width of CS | Reference |
|---|---|---|
| Sub-Gaussian / bounded variance | 3 | (Howard et al., 2018, Wang et al., 2022) |
| Bounded 4-th moment (5) | $O(n{-(p-1)/p} |