Asymptotic Anytime-Valid Confidence Sequences
- Asymptotic anytime-valid confidence sequences are intervals that guarantee high-probability parameter coverage uniformly over time in the large-sample limit.
- They are constructed via martingale techniques and strong invariance principles, achieving near-optimal CLT shrinkage rates even under arbitrary stopping.
- AA-CSs enable robust sequential analysis and adaptive stopping rules in applications such as A/B testing, stochastic approximation, and causal inference.
An asymptotic anytime-valid confidence sequence (AA-CS) is a sequence of intervals or sets for a target parameter, constructed so that, with high probability (at least ), the parameter is contained in all intervals, uniformly over time, after a suitable burn-in. This property is guaranteed even under continuous monitoring and arbitrary stopping rules. The "asymptotic" qualifier indicates that the time-uniform guarantee is achieved in the large-sample limit, under weak moment or regularity conditions, rather than for all finite samples as in classical nonasymptotic CSs. AA-CSs blend the strengths of CLT-based inference (minimal assumptions, 1^ asymptotic coverage) with the time-uniform error control required for safe sequential analysis.
1. Foundations and Definitions
The definition of an asymptotic anytime-valid confidence sequence is formalized through two essential features: (i) time-uniform coverage and (ii) asymptotic sharpness. For a parameter of interest, an AA-CS is a sequence of intervals/processes such that for all ,
This means that, for sufficiently large sample size (after a burn-in ), the probability that is ever excluded drops to at most , and the size of shrinks with at CLT-optimal or nearly-optimal rates. For example, for the mean of i.i.d. variables with finite variance, AA-CS widths are typically or even (Waudby-Smith et al., 2021, Gnettner et al., 14 Feb 2025, Waudby-Smith et al., 2023).
A key distinction from classical fixed-time intervals or nonasymptotic CSs is that AA-CSs remain correct under arbitrary stopping (including data-dependent rules), with only a mild price in width for time-uniformity, and reduced regularity restrictions compared to nonasymptotic CSs (which require e.g. sub-Gaussian or bounded increments).
2. Core Methodology: Martingale Construction and Asymptotic Approximation
The construction of AA-CSs is generally based on explicit strong invariance (coupling) principles, martingale super/submartingale constructions, and mixture martingale (Bayesian mixture) boundaries. The principal recipe, formalized in (Waudby-Smith et al., 2021, Gnettner et al., 14 Feb 2025), consists of the following steps:
- Step 1: Obtain a strong invariance principle (e.g., Strassen’s theorem) to couple the empirical process (e.g., partial sums of centered data) to a Gaussian process, uniformly over all times , up to a error.
- Step 2: Design an explicit, time-uniform Gaussian boundary (e.g., Robbins–Siegmund mixture or law-of-iterated-logarithm (LIL) boundary) that serves as an envelope for the coupled process.
- Step 3: Use plug-in variance or covariance estimators, supported by their strong consistency, to transfer the Gaussian boundary to the empirical process.
- Step 4: Invert the acceptance region of an associated e-process or test-martingale to obtain the AA-CS.
The mathematical core of these constructions exploits the predictability of the empirical process, the pathwise properties of strong invariance, and Ville’s maximal inequality or its asymptotic variant for supermartingales (Ramdas et al., 2022, Gnettner et al., 14 Feb 2025).
3. Explicit Forms and Shrinkage Rates
The canonical instantiation for the population mean of an i.i.d. sequence with variance is
where and are the sample mean/standard deviation, and is a tunable parameter to minimize boundary width at a desired horizon (Waudby-Smith et al., 2021, Maharaj et al., 2023, Waudby-Smith et al., 2023). This boundary is valid uniformly over all times, is minimax rate-optimal in width, and shrinks at , closely matching classical CLT intervals.
Sharper variants with adaptivity or improved sharpness can be obtained via choices of in a flexible boundary family (Gnettner et al., 14 Feb 2025). In high-dimensional or multi-parameter settings (such as the mean vector of a stochastic approximation), explicit multivariate boundaries based on the LIL or Gaussian-mixture argument achieve asymptotic time-uniform validity (Xie et al., 2024).
In structured settings, such as parametric models or linear regression, AA-CSs can be derived via inversion of sequential likelihood ratios, test martingales, or mixture e-processes, yielding intervals that asymptotically coincide with fixed-sample Wald intervals in both position and width (Wang et al., 2023, Lindon et al., 2022, Ramdas et al., 2022).
4. Robustness, Uniformity, and Model Requirements
A central strength of the AA-CS paradigm is robustness to distributional deviations: it applies under minimal assumptions (finite second moments, martingale dependence), and can be proved both pointwise (for a given law) and uniformly over classes of laws . The notion of distribution-uniform AA-CS, formalized by (Waudby-Smith et al., 2023), ensures that the asymptotic time-uniform error control holds simultaneously over all , under the existence of sufficient moments. For instance, the 'Robbins–Siegmund' boundary with an empirical variance plug-in achieves such -uniformity with only a mild logarithmic penalty in boundary width.
For regression and double/debiased machine learning (DML) settings (Dalal et al., 2024, Lindon et al., 2022), the resulting AA-CSs require only rates of cross-fitted nuisance parameter estimation faster than , and they remain valid under arbitrary stopping/tuning, a crucial property for modern adaptive or online causal inference pipelines.
AA-CSs have also been developed and analyzed for nonparametric or heavy-tailed settings, including explicit lower confidence sequences for right heavy-tailed data (Mineiro, 2022), and multinomial or multivariate cases (Lindon et al., 2020).
5. Practical Stopping Rules and Applications
The time-uniform property of AA-CSs enables the design of stopping rules that are both adaptive and statistically principled. The usual strategy is to halt when the width of the CS drops below a user-specified threshold, or when the null is excluded (Aolaritei et al., 15 Dec 2025, Maharaj et al., 2023).
For example, in stochastic approximation and SGD, the stopping time
for an upper AA-CS for average suboptimality, is almost surely finite, and at , the guarantee holds (Aolaritei et al., 15 Dec 2025).
AA-CSs now underpin several deployed and large-scale experimental platforms, including A/B testing infrastructure for Adobe Experience Platform (Maharaj et al., 2023), and sequential analyses in contemporary industrial settings; they are vital for adaptive experimentation, best-arm identification, and anytime-robust estimation in sequential machine learning and statistics (Cho et al., 31 Dec 2025, Liang et al., 2023).
A table listing representative AA-CS constructions is included below for reference.
| Paper | Problem Type | AA-CS Boundary (width) | Main Assumptions |
|---|---|---|---|
| (Waudby-Smith et al., 2021) | Mean, Martingale/ATE | finite variance/invariance | |
| (Gnettner et al., 14 Feb 2025) | Mean, flexible boundary | , many choices | i.i.d., CLT, estimator |
| (Waudby-Smith et al., 2023) | Mean, -uniform | finite moments | |
| (Xie et al., 2024) | SA/SGD, multivariate | GM, LIL, LIL-ε-Net () | regularity, CLT-type |
| (Aolaritei et al., 15 Dec 2025) | Projected SGD, convex | explicit data-dependent suboptimality bound | convexity, sub-G noise, bounded |
| (Maharaj et al., 2023) | A/B, ratio/lift, streaming | as above | finite 2+δ moments |
6. Sharpness, Optimality, and Limitations
The term "sharp" AA-CS (see (Gnettner et al., 14 Feb 2025)) refers to sequences for which the coverage is exactly in the large- limit—i.e., the limiting coverage is not conservative. The selection of boundary scaling and normalization constants is critical: the boundary height must match the distributional supremum of the corresponding functional of Brownian motion (e.g., via quantiles of for a chosen boundary shape ), often requiring numerical calibration.
Optimality is generally achieved in the minimax sense: the boundary widths are of the same order as the fixed-time CLT or LIL rates (, down to for sub-Gaussian or LIL settings) (Ramdas et al., 2022). Results in (Waudby-Smith et al., 2023) show that under only finite -th moment conditions, the AA-CS width is at most a factor slower than the best-possible rate.
An essential limitation is that, for small sample sizes, AA-CSs may not provide coverage guarantees (coverage is only asymptotic), in contrast to nonasymptotic CSs, which are necessarily wider but hold for all finite samples. This reflects a trade-off: AA-CSs deliver CLT-style optimal widths and broad applicability, at the expense of finite-sample conservatism.
7. Extensions and Current Research Frontiers
AA-CS theory has been extended to a broad range of settings, including:
- Conditional means and causal parameters: sequential ATE and LATE estimation with regression adjustments or DML (Dalal et al., 2024, Lindon et al., 2022).
- Adaptive and contextual bandits: AA-CSs for ATE and best-arm identification, incorporating sequential regression and variance reduction (Cho et al., 31 Dec 2025, Liang et al., 2023).
- Continuous-time and Poisson processes: AA-CSs for cumulative rates and arrival intensities (Lindon et al., 2024).
- Machine learning and SGD: uniform certificates for suboptimality, data-dependent stopping, and adaptation to convex (but not strongly convex) objectives (Aolaritei et al., 15 Dec 2025).
Recent work has also established AA-CSs with distributional uniformity (valid simultaneously over entire parametric or nonparametric classes) (Waudby-Smith et al., 2023), and explored the connection to optimal sequential tests, minimaxity, and information-theoretic lower bounds.
Open problems include explicit nonasymptotic AA-CSs for stochastic approximation algorithms, rate-optimal AA-CSs under heavy-tailed or dependent data, and adaptive boundaries calibrated to unknown self-normalized variances. Theoretical development continues to focus on balancing sharpness, robustness, and finite-sample validity in ever-higher-dimensional and less regular data regimes.