Variance-Adaptive Confidence Sequences

Updated 25 December 2025

Variance-adaptive CSs are sequences of confidence intervals that adapt their width using the cumulative empirical variance, ensuring nonasymptotic and time-uniform coverage.
They leverage self-normalization and martingale exponential techniques to achieve optimal shrinking rates, matching limits dictated by the law of the iterated logarithm.
These methods are extendable to handle heavy-tailed data, matrix estimations, and adaptive online inference in settings like bandit algorithms and reinforcement learning.

A variance-adaptive confidence sequence (CS) is a sequence of confidence intervals for an online, possibly non-i.i.d., stochastic process, whose width adapts at each time $t$ to the empirical variance accumulated so far. Such sequences provide nonasymptotic, nonparametric, and time-uniform coverage guarantees, meaning the probability of ever excluding the true quantity of interest across all times is controlled at a prescribed level. Variance-adaptive CSs generalize classical fixed-variance (sub-Gaussian) boundaries, achieve optimal shrinking rates (including the iterated logarithm law), and have been extended to settings such as matrix mean estimation, heavy-tailed data, and sampling without replacement.

1. Foundations and Nonparametric Setting

Variance-adaptive CSs, particularly those of the “empirical-Bernstein” type, are grounded in minimal assumptions. The prototypical setup involves a sequence of real-valued random variables $X_t \in [a,b]$ , a predictable sequence of “predictions” $\hat X_t$ , and the observed filtration $\{\mathcal F_t\}$ . The only technical condition required is that the martingale difference sequence $Y_t = X_t - \mathbb E[X_t\,|\,\mathcal F_{t-1}]$ is almost surely bounded by $c = b-a$ (Howard et al., 2018).

The primary estimands are:

The mean process $\mu_t = t^{-1} \sum_{i=1}^t \mathbb E[X_i\,|\,\mathcal F_{i-1}]$
The variance process (empirical proxy) $V_t = \sum_{i=1}^t (X_i - \hat X_i)^2$

These sequences remain valid without independence, identical distribution, or strong tail assumptions.

2. Empirical-Bernstein (Variance-Adaptive) CS Construction

The empirical-Bernstein confidence sequence is built upon a self-normalization/martingale-exponential construction:

For all $\lambda \in [0, 1/c)$ ,

$\mathbb E \left[ \exp \left\{ \lambda \sum_{i=1}^t Y_i - \psi_{E,c}(\lambda) V_t \right\} \right] \leq 1$

where $\psi_{E,c}(\lambda) = c^{-2}(-\ln(1-c\lambda)-c\lambda)$ (Howard et al., 2018).

For any “subexponential” uniform boundary $u(v)$ ,

$\mathbb P\left( \sup_{t\geq1} |\widehat{\mu}_t - \mu_t| > u(V_t)/t \right) \leq 2\alpha$

where $\widehat{\mu}_t = t^{-1} \sum_{i=1}^t X_i$ , and $w_t = u(V_t)/t$ is the data-driven, variance-adaptive width.

A widely used closed-form instantiation is the “polynomial-stitched” boundary for $X_t \in [0,1]$ and coverage $1-2\alpha$ :

$C_t = \widehat{\mu}_t \pm \frac{1}{t}\left(1.7\sqrt{V_t\left[\log\log(2V_t)+3.8\right]} + 3.4\left[\log\log(2V_t)+3.8\right]\right)$

as given in Eq. (27) of (Howard et al., 2018).

3. Time-Uniform Coverage and LIL-Optimal Shrinkage

Variance-adaptive CSs provide time-uniform nonasymptotic coverage:

$\mathbb P\left(\forall t \ge 1: \mu_t \in [\widehat{\mu}_t \mp u(V_t)/t]\right) \ge 1-2\alpha$

The width $w_t$ adapts to observed variance and, for (sub-)i.i.d. data with variance $\sigma^2$ , $V_t \asymp \sigma^2 t$ , so:

$w_t \asymp \sqrt{\sigma^2 \log\log t / t}$
This matches the lower bound dictated by the law of the iterated logarithm (LIL) for uniform-in-time confidence intervals (Howard et al., 2018).

4. Comparison to Fixed-Variance and Other Adaptive CSs

A sub-Gaussian (fixed-variance) CS with worst-case variance $(b-a)^2/4$ produces

$|\widehat{\mu}_t-\mu| \le \frac{b-a}{2} \sqrt{\frac{2\log(1/\alpha)}{t}}$

which can be extremely conservative if the actual variance is small.

Empirical-Bernstein (variance-adaptive) CSs instead use the empirical $V_t$ , sharply tightening intervals when the process is low-variance. For Bernoulli-0.01 data, sub-Gaussian CSs can be $5\times$ wider than the empirical-Bernstein CS (Howard et al., 2018).

Variance-adaptive CSs have extensions for heavy-tailed and infinite-variance settings, such as Catoni-style CSs for known-variance or $p$ -th-moment bounds (Wang et al., 2022), and CSs integrating heavier-tailed nonnegativity constraints (Mineiro, 2022).

5. Methodological Extensions and Matrix Generalizations

Recent developments yield closed-form, mixture-based empirical-Bernstein CSs for both scalar and matrix means:

The latest closed-form variant (Chugg et al., 24 Dec 2025) constructs, for $X_t \in [0,1]$ ,

$V_t = \sum_{i=1}^t \psi_E(|X_i-\hat X_i|), \quad \psi_E(\lambda) = -\ln(1-\lambda) - \lambda$

and defines the width

$W_t = \frac{2}{t} \sqrt{U_t\left(\ell_\alpha + \frac{1}{2}\ln(2U_t)\right)}$

with $U_t = 1/(2\kappa^2) + V_t$ , and $\ell_\alpha$ an explicit log factor.

For a sequence of symmetric matrices $X_t$ with bounded eigenvalues, the same polynomial structure yields a CS for the maximal eigenvalue deviation:

$|\gamma_{\max}(\bar X_t - M_t)| \le W_t$

where $V_t = \sum \psi_E(|X_i - \hat X_i|)$ (matrix norm), and $M_t = t^{-1} \sum_{i=1}^t \mathbb E_{i-1}[X_i]$ (Chugg et al., 24 Dec 2025).

A key property of these new CSs is that, in the constant-mean, i.i.d. regime, the limiting width scaled by $\sqrt{t/\log t}$ is independent of the confidence level $\alpha$ —a provable improvement over previous closed-form solutions.

6. Applications and Empirical Performance

Variance-adaptive CSs are widely applicable:

Covariance matrix estimation
Sample average treatment effect inference under the Neyman-Rubin potential outcomes model
Bandit algorithms and A/B testing with continuous monitoring
Adaptive and safe inference in reinforcement learning and online learning
Sampling without replacement, yielding substantial improvements when the sample variance is much less than the worst-case variance of the population (Waudby-Smith et al., 2020)
Linear bandits, where variance-adaptive CSs are used to build ellipsoidal confidence sets for $\theta^*$ with widths scaling to the sum of observed conditional variances (Jun et al., 12 Feb 2024)

Empirical studies (Chugg et al., 24 Dec 2025) show these CSs achieve or outperform previous variance-adaptive CSs and maintain coverage over a time horizon of up to $10^6$ samples. Performance is especially superior in low-variance, nonstationary, or time-varying mean settings.

7. Theoretical and Practical Implications

Variance-adaptive CSs represent a sharp advance in anytime valid inference, combining:

Time-uniform coverage with LIL-optimal shrinking
Fully nonparametric applicability, using data-driven variance proxies
The ability to handle non-i.i.d., martingale-dependent, and heavy-tailed settings (with appropriate extensions)
Closed-form, practically implementable expressions (e.g., the latest mixture-Bernstein CS (Chugg et al., 24 Dec 2025))
A robust foundation in mixture-based or self-normalized martingale concentration, often using the methods of mixture martingales, Ville's inequality, and polynomial “stitching”

Their flexibility and optimality have positioned variance-adaptive CSs as standard primitives in modern sequential estimation, especially as uncertainty quantification tools in high-frequency, online, or nonstationary environments (Howard et al., 2018, Chugg et al., 24 Dec 2025, Wang et al., 2022, Mineiro, 2022, Waudby-Smith et al., 2020, Jun et al., 12 Feb 2024).