Sequential Hypothesis Testing E-valuators
- Sequential hypothesis testing-based e-valuators are anytime-valid statistical methods that use e-values to construct nonnegative supermartingales for rigorous Type I error control.
- They dynamically adapt through online convex optimization and mixture strategies, optimizing sample complexity and achieving exponential power growth under alternative hypotheses.
- This framework unifies frequentist and Bayesian inference, supporting applications in online learning, adaptive trials, and real-time decision systems.
Sequential hypothesis testing-based e-valuators are anytime-valid statistical procedures that evaluate hypotheses using streams of data by dynamically updating a process—usually a nonnegative supermartingale—constructed from so-called e-values or e-variables. These methods allow valid inference under optional stopping, adversarial data collection, and adaptivity, and are particularly powerful for nonparametric, composite, and online testing tasks. The e-value paradigm unifies testing, confidence sequences, and log-optimal betting strategies, with foundational results ensuring rigorous Type I error control, compatibility with online convex optimization, and systematic construction schemas for a wide variety of testing problems.
1. Mathematical Foundations of Sequential E-valuators
Let denote a family of distributions over a sample space and let be an elicitable and identifiable functional of interest (e.g., mean, quantile, expectile), specified via a strictly consistent scoring rule such that, for all , the population risk
is uniquely minimized at (Casgrain et al., 2022). The identification function links the scoring rule to a vector-valued root characterization: satisfies .
A sequential e-value process (or e-process) is a nonnegative adapted process in a filtration such that, under the null hypothesis ,
The one-step "betting" increments are constructed as
with a predictable sequence chosen to ensure (). The process is a nonnegative supermartingale and thus a valid e-process (Casgrain et al., 2022, Ramdas et al., 2024).
More generally, for any sequential e-variables adapted to with , the product forms an e-process (Arnold et al., 2021). This construction is central to the sequential hypothesis testing-based e-valuator paradigm.
2. Anytime-Validity and Type I Error Control
Sequential e-valuators provide rigorous Type I error control at any stopping time via Ville's inequality:
for any (Casgrain et al., 2022, Ramdas et al., 2024, Arnold et al., 2021). Consequently, the stopping rule
defines an -level anytime-valid sequential test. This property is robust to optional stopping, stopping for a reason, data-adaptive stopping, and even certain forms of adversarial data collection.
In composite or nonparametric settings, predictable mixture strategies over a family of base supermartingales or e-values are often employed. Here, the e-process is
with a -measurable probability measure over (Casgrain et al., 2022). Mixture-of-experts or online-convex-optimization (OCO) strategies select the mixing measure to adaptively maximize expected log-growth or minimize regret.
3. Design and Algorithmic Structure
The sequential e-valuator framework is algorithmically instantiated as follows (Casgrain et al., 2022, Ramdas et al., 2024):
- Initialization: Set and choose target significance .
- Observation: For , observe new data .
- Adaptation: Compute (or the parameter for mixture strategies) as a function of .
- Increment Computation:
- Update: .
- Stopping Rule: Halt and reject if .
In higher-dimensional or non-bounded data settings, sub- cumulant generating function bounds allow construction of exponential supermartingales:
and predictable mixtures over and yield e-processes robust to sub-exponential or heavy-tailed data (Casgrain et al., 2022).
For broad composite nulls, the family of e-processes is often mixed with respect to a prior or adapted with online learning (Waudby-Smith et al., 3 Apr 2025, Ramdas et al., 2024). Portfolios of e-processes with sublinear "regret" versus the oracle best in hindsight yield universal log-optimality (Waudby-Smith et al., 3 Apr 2025).
4. Power Analysis and Regret Bounds
Under composite alternatives, sequential e-valuators achieve exponential power growth if there exists such that
and if the regret,
$\mathrm{Regret}_n = \max_{\theta\in\Theta}\left(\log L_n^\theta - \log M_n\right), \qquad \text{(for mixture $M_n$)}$
is almost surely. Results from OCO (e.g., Follow-the-Leader, FTRL, OGD) guarantee sublinear regret and thus asymptotic power one (Casgrain et al., 2022, Waudby-Smith et al., 3 Apr 2025):
- Asymptotic Growth: For log-optimal betting strategies,
almost surely under the alternative , with the maximal expected log-increment (Waudby-Smith et al., 3 Apr 2025).
- Stopping-Time Efficiency: The expected stopping time under any satisfies
i.e., the test achieves optimal expected sample complexity, matching the pointwise Kelly rate (Waudby-Smith et al., 3 Apr 2025).
5. Applicability and Scope
Sequential hypothesis testing-based e-valuators, as developed in (Casgrain et al., 2022), cover
- Means and mean ratios under boundedness or sub-Gaussian/sub-exponential conditions,
- Quantiles and expectiles including VaR, multi-quantile vectors, and regression functionals,
- Goodness-of-fit, forecast calibration, and probabilistic model assessment in nonparametric setups (Arnold et al., 2021, Henzi et al., 2021),
- High-dimensional and ML-driven tests (e.g., deep classification two-sample, online independence and conditional independence) with nonparametric power via learned betting strategies (Pandeva et al., 2023, Pandeva et al., 2022).
The framework is also extensible to online and multiple testing settings—where e-values per hypothesis are multiplied or combined via closed testing or "discovery matrix" procedures for strong FWER/FDP control (Fischer et al., 2024, Vovk et al., 2020, Hartog et al., 15 Jan 2025).
6. Robustness, Extensions, and Practical Considerations
Robustness to unbounded or heavy-tailed data is handled via sub- bounds; e-processes can be constructed in exponential form and mixtures over parameter or tuning spaces can be performed using online convex optimization (Casgrain et al., 2022). The same formalism admits sequential testing with estimation, confidence sequences, and stopping-time statements.
Advantages include:
- Anytime-valid error control: Type I error control is uniform over all (possibly data-adaptive) stopping rules;
- Direct interpretability of evidence: The process can be regarded as accumulated "betting capital" against ;
- Flexible, adaptive power optimization: OCO or hedged mixtures allow minimax or log-optimal adaptivity over alternatives;
- Compatibility with decision-theoretic and modern ML protocols, supporting test design via virtually any scoring rule or identification functional.
Prominent limitations and considerations are
- The requirement that the null functional be explicitly elicitable and have a strict identification function.
- Power may be below fixed-sample tests in specific dependence structures or high-lag blocking (Casgrain et al., 2022, Henzi et al., 2021).
- Tuning of betting fractions or mixture priors is critical for power; algorithmic learnability conditions are needed for some log-optimality results.
For particular tasks such as sequential forecast calibration or nonparametric two-sample testing, specialized construction and batching strategies provide empirical evidence of strong power and low Type I error in both simulated and real-data settings (Arnold et al., 2021, Pandeva et al., 2022).
7. Connections and Impact
This methodology unifies frequentist and Bayesian perspectives: e-values generalize p-values (via $1/E$ calibration) but permit optional continuation and combination; in the simple hypothesis case, e-values are likelihood ratios and coincide with Bayes factors under suitable priors (Ramdas et al., 2024, Grünwald et al., 2019). The interpretation via test supermartingales ties e-valuator procedures to game-theoretic probability and log-optimal portfolio constructions.
Sequential e-valuators underpin modern approaches to universal inference, multistage experimental design, adaptive clinical trials, online learning, and real-time decision systems. Their generality, statistical safety, and universality under optional stopping make them central tools for rigorous sequential inference in contemporary statistics and machine learning (Casgrain et al., 2022, Ramdas et al., 2024, Waudby-Smith et al., 3 Apr 2025, Pandeva et al., 2023).