Variance-Adaptive Confidence Sequences
- Variance-adaptive CSs are sequences of confidence intervals that adapt their width using the cumulative empirical variance, ensuring nonasymptotic and time-uniform coverage.
- They leverage self-normalization and martingale exponential techniques to achieve optimal shrinking rates, matching limits dictated by the law of the iterated logarithm.
- These methods are extendable to handle heavy-tailed data, matrix estimations, and adaptive online inference in settings like bandit algorithms and reinforcement learning.
A variance-adaptive confidence sequence (CS) is a sequence of confidence intervals for an online, possibly non-i.i.d., stochastic process, whose width adapts at each time to the empirical variance accumulated so far. Such sequences provide nonasymptotic, nonparametric, and time-uniform coverage guarantees, meaning the probability of ever excluding the true quantity of interest across all times is controlled at a prescribed level. Variance-adaptive CSs generalize classical fixed-variance (sub-Gaussian) boundaries, achieve optimal shrinking rates (including the iterated logarithm law), and have been extended to settings such as matrix mean estimation, heavy-tailed data, and sampling without replacement.
1. Foundations and Nonparametric Setting
Variance-adaptive CSs, particularly those of the “empirical-Bernstein” type, are grounded in minimal assumptions. The prototypical setup involves a sequence of real-valued random variables , a predictable sequence of “predictions” , and the observed filtration . The only technical condition required is that the martingale difference sequence is almost surely bounded by (Howard et al., 2018).
The primary estimands are:
- The mean process
- The variance process (empirical proxy)
These sequences remain valid without independence, identical distribution, or strong tail assumptions.
2. Empirical-Bernstein (Variance-Adaptive) CS Construction
The empirical-Bernstein confidence sequence is built upon a self-normalization/martingale-exponential construction:
- For all ,
where (Howard et al., 2018).
For any “subexponential” uniform boundary ,
where , and is the data-driven, variance-adaptive width.
A widely used closed-form instantiation is the “polynomial-stitched” boundary for and coverage :
as given in Eq. (27) of (Howard et al., 2018).
3. Time-Uniform Coverage and LIL-Optimal Shrinkage
Variance-adaptive CSs provide time-uniform nonasymptotic coverage:
The width adapts to observed variance and, for (sub-)i.i.d. data with variance , , so:
- This matches the lower bound dictated by the law of the iterated logarithm (LIL) for uniform-in-time confidence intervals (Howard et al., 2018).
4. Comparison to Fixed-Variance and Other Adaptive CSs
A sub-Gaussian (fixed-variance) CS with worst-case variance produces
which can be extremely conservative if the actual variance is small.
Empirical-Bernstein (variance-adaptive) CSs instead use the empirical , sharply tightening intervals when the process is low-variance. For Bernoulli-0.01 data, sub-Gaussian CSs can be wider than the empirical-Bernstein CS (Howard et al., 2018).
Variance-adaptive CSs have extensions for heavy-tailed and infinite-variance settings, such as Catoni-style CSs for known-variance or -th-moment bounds (Wang et al., 2022), and CSs integrating heavier-tailed nonnegativity constraints (Mineiro, 2022).
5. Methodological Extensions and Matrix Generalizations
Recent developments yield closed-form, mixture-based empirical-Bernstein CSs for both scalar and matrix means:
- The latest closed-form variant (Chugg et al., 24 Dec 2025) constructs, for ,
and defines the width
with , and an explicit log factor.
- For a sequence of symmetric matrices with bounded eigenvalues, the same polynomial structure yields a CS for the maximal eigenvalue deviation:
where (matrix norm), and (Chugg et al., 24 Dec 2025).
A key property of these new CSs is that, in the constant-mean, i.i.d. regime, the limiting width scaled by is independent of the confidence level —a provable improvement over previous closed-form solutions.
6. Applications and Empirical Performance
Variance-adaptive CSs are widely applicable:
- Covariance matrix estimation
- Sample average treatment effect inference under the Neyman-Rubin potential outcomes model
- Bandit algorithms and A/B testing with continuous monitoring
- Adaptive and safe inference in reinforcement learning and online learning
- Sampling without replacement, yielding substantial improvements when the sample variance is much less than the worst-case variance of the population (Waudby-Smith et al., 2020)
- Linear bandits, where variance-adaptive CSs are used to build ellipsoidal confidence sets for with widths scaling to the sum of observed conditional variances (Jun et al., 12 Feb 2024)
Empirical studies (Chugg et al., 24 Dec 2025) show these CSs achieve or outperform previous variance-adaptive CSs and maintain coverage over a time horizon of up to samples. Performance is especially superior in low-variance, nonstationary, or time-varying mean settings.
7. Theoretical and Practical Implications
Variance-adaptive CSs represent a sharp advance in anytime valid inference, combining:
- Time-uniform coverage with LIL-optimal shrinking
- Fully nonparametric applicability, using data-driven variance proxies
- The ability to handle non-i.i.d., martingale-dependent, and heavy-tailed settings (with appropriate extensions)
- Closed-form, practically implementable expressions (e.g., the latest mixture-Bernstein CS (Chugg et al., 24 Dec 2025))
- A robust foundation in mixture-based or self-normalized martingale concentration, often using the methods of mixture martingales, Ville's inequality, and polynomial “stitching”
Their flexibility and optimality have positioned variance-adaptive CSs as standard primitives in modern sequential estimation, especially as uncertainty quantification tools in high-frequency, online, or nonstationary environments (Howard et al., 2018, Chugg et al., 24 Dec 2025, Wang et al., 2022, Mineiro, 2022, Waudby-Smith et al., 2020, Jun et al., 12 Feb 2024).