Sequential Hypothesis Testing E-valuators

Updated 9 January 2026

Sequential hypothesis testing-based e-valuators are anytime-valid statistical methods that use e-values to construct nonnegative supermartingales for rigorous Type I error control.
They dynamically adapt through online convex optimization and mixture strategies, optimizing sample complexity and achieving exponential power growth under alternative hypotheses.
This framework unifies frequentist and Bayesian inference, supporting applications in online learning, adaptive trials, and real-time decision systems.

Sequential hypothesis testing-based e-valuators are anytime-valid statistical procedures that evaluate hypotheses using streams of data by dynamically updating a process—usually a nonnegative supermartingale—constructed from so-called e-values or e-variables. These methods allow valid inference under optional stopping, adversarial data collection, and adaptivity, and are particularly powerful for nonparametric, composite, and online testing tasks. The e-value paradigm unifies testing, confidence sequences, and log-optimal betting strategies, with foundational results ensuring rigorous Type I error control, compatibility with online convex optimization, and systematic construction schemas for a wide variety of testing problems.

1. Mathematical Foundations of Sequential E-valuators

Let $\mathcal{P}$ denote a family of distributions over a sample space $\mathcal{X}$ and let $T:\mathcal{P}\rightarrow\Theta\subset\mathbb{R}^d$ be an elicitable and identifiable functional of interest (e.g., mean, quantile, expectile), specified via a strictly consistent scoring rule $S:\Theta\times\mathcal{X}\rightarrow\mathbb{R}$ such that, for all $P\in\mathcal{P}$ , the population risk

$\theta\mapsto \mathbb{E}_{X\sim P}S(\theta,X)$

is uniquely minimized at $\theta=T(P)$ (Casgrain et al., 2022). The identification function $V(\theta,x)=\nabla_\theta S(\theta,x)$ links the scoring rule to a vector-valued root characterization: $T(P)$ satisfies $\mathbb{E}_{X\sim P}[V(\theta,X)]=0$ .

A sequential e-value process (or e-process) is a nonnegative adapted process $(M_n)_{n\ge 0}$ in a filtration $(\mathcal{F}_n)_{n\ge 0}$ such that, under the null hypothesis $H_0:\ T(P) = \theta_0$ ,

$\mathbb{E}[M_n\mid\mathcal{F}_{n-1}]\le M_{n-1},\qquad M_0=1.$

The one-step "betting" increments are constructed as

$m_n = 1 + \eta_n V(\theta_0, X_n),$

with $\eta_n$ a predictable sequence chosen to ensure $m_n\ge 0$ ( $\eta_n V(\theta_0,X_n) > -1$ ). The process $M_n = \prod_{i=1}^n m_i$ is a nonnegative supermartingale and thus a valid e-process (Casgrain et al., 2022, Ramdas et al., 2024).

More generally, for any sequential e-variables $E_n$ adapted to $\mathcal{F}_{n}$ with $\mathbb{E}[E_n|\mathcal{F}_{n-1}]\le 1$ , the product $M_n = \prod_{i=1}^n E_i$ forms an e-process (Arnold et al., 2021). This construction is central to the sequential hypothesis testing-based e-valuator paradigm.

2. Anytime-Validity and Type I Error Control

Sequential e-valuators provide rigorous Type I error control at any stopping time via Ville's inequality:

$\mathbb{P}_{H_0}\left(\sup_{n\ge 0}M_n\ge 1/\alpha\right) \le \alpha$

for any $\alpha\in(0,1)$ (Casgrain et al., 2022, Ramdas et al., 2024, Arnold et al., 2021). Consequently, the stopping rule

$\tau = \inf\{n\ge 1 : M_n \ge 1/\alpha\}$

defines an $\alpha$ -level anytime-valid sequential test. This property is robust to optional stopping, stopping for a reason, data-adaptive stopping, and even certain forms of adversarial data collection.

In composite or nonparametric settings, predictable mixture strategies over a family $\{L^\theta\}_{\theta\in\Theta}$ of base supermartingales or e-values are often employed. Here, the e-process is

$M_n = \prod_{i=1}^n \int_\Theta \frac{L_i^\theta}{L_{i-1}^\theta}\,\pi_i(d\theta),$

with $\pi_i$ a $\mathcal{F}_{i-1}$ -measurable probability measure over $\Theta$ (Casgrain et al., 2022). Mixture-of-experts or online-convex-optimization (OCO) strategies select the mixing measure to adaptively maximize expected log-growth or minimize regret.

3. Design and Algorithmic Structure

The sequential e-valuator framework is algorithmically instantiated as follows (Casgrain et al., 2022, Ramdas et al., 2024):

Initialization: Set $M_0=1$ and choose target significance $\alpha$ .
Observation: For $n=1,2,\ldots$ , observe new data $X_n$ .
Adaptation: Compute $\eta_n$ (or the parameter $\pi_n$ for mixture strategies) as a function of $X_1,\ldots,X_{n-1}$ .
Increment Computation:

$m_n = 1 + \eta_n V(\theta_0, X_n) \quad \text{or} \quad m_n = \int_\Theta \frac{L_n^\theta}{L_{n-1}^\theta}\,\pi_n(d\theta).$

Update: $M_n = M_{n-1}m_n$ .
Stopping Rule: Halt and reject $H_0$ if $M_n \ge 1/\alpha$ .

In higher-dimensional or non-bounded data settings, sub- $\psi$ cumulant generating function bounds allow construction of exponential supermartingales:

$L_n(u) = \exp\left(u Y_n - \sum_{i=1}^n \psi(u)\right), \qquad Y_n = \sum_{i=1}^n \Delta Y_i, \ \Delta Y_i = V(\theta_0, X_i),$

and predictable mixtures over $u$ and $\theta$ yield e-processes robust to sub-exponential or heavy-tailed data (Casgrain et al., 2022).

For broad composite nulls, the family of e-processes $M_n^{(\theta)}$ is often mixed with respect to a prior or adapted with online learning (Waudby-Smith et al., 3 Apr 2025, Ramdas et al., 2024). Portfolios of e-processes with sublinear "regret" versus the oracle best in hindsight yield universal log-optimality (Waudby-Smith et al., 3 Apr 2025).

4. Power Analysis and Regret Bounds

Under composite alternatives, sequential e-valuators achieve exponential power growth if there exists $\theta^*$ such that

$\liminf_{n\to\infty}\tfrac1n \log L_n^{\theta^*} > 0\quad \text{a.s.}$

and if the regret,

$\mathrm{Regret}_n = \max_{\theta\in\Theta}\left(\log L_n^\theta - \log M_n\right), \qquad \text{(for mixture $M_n$)}$

is $o(n)$ almost surely. Results from OCO (e.g., Follow-the-Leader, FTRL, OGD) guarantee sublinear regret and thus asymptotic power one (Casgrain et al., 2022, Waudby-Smith et al., 3 Apr 2025):

Asymptotic Growth: For log-optimal betting strategies,

$\lim_{n\to\infty} \frac{1}{n} \log M_n = \ell^*_Q > 0$

almost surely under the alternative $Q$ , with $\ell^*_Q$ the maximal expected log-increment (Waudby-Smith et al., 3 Apr 2025).

Stopping-Time Efficiency: The expected stopping time under any $Q$ satisfies

$\lim_{\alpha\to 0} \frac{\mathbb{E}_Q[\tau_\alpha]}{\log(1/\alpha)} = 1/\ell^*_Q,$

i.e., the test achieves optimal expected sample complexity, matching the pointwise Kelly rate (Waudby-Smith et al., 3 Apr 2025).

5. Applicability and Scope

Sequential hypothesis testing-based e-valuators, as developed in (Casgrain et al., 2022), cover

Means and mean ratios under boundedness or sub-Gaussian/sub-exponential conditions,
Quantiles and expectiles including VaR, multi-quantile vectors, and regression functionals,
Goodness-of-fit, forecast calibration, and probabilistic model assessment in nonparametric setups (Arnold et al., 2021, Henzi et al., 2021),
High-dimensional and ML-driven tests (e.g., deep classification two-sample, online independence and conditional independence) with nonparametric power via learned betting strategies (Pandeva et al., 2023, Pandeva et al., 2022).

The framework is also extensible to online and multiple testing settings—where e-values per hypothesis are multiplied or combined via closed testing or "discovery matrix" procedures for strong FWER/FDP control (Fischer et al., 2024, Vovk et al., 2020, Hartog et al., 15 Jan 2025).

6. Robustness, Extensions, and Practical Considerations

Robustness to unbounded or heavy-tailed data is handled via sub- $\psi$ bounds; e-processes can be constructed in exponential form and mixtures over parameter or tuning spaces can be performed using online convex optimization (Casgrain et al., 2022). The same formalism admits sequential testing with estimation, confidence sequences, and stopping-time statements.

Advantages include:

Anytime-valid error control: Type I error control is uniform over all (possibly data-adaptive) stopping rules;
Direct interpretability of evidence: The process $M_n$ can be regarded as accumulated "betting capital" against $H_0$ ;
Flexible, adaptive power optimization: OCO or hedged mixtures allow minimax or log-optimal adaptivity over alternatives;
Compatibility with decision-theoretic and modern ML protocols, supporting test design via virtually any scoring rule or identification functional.

Prominent limitations and considerations are

The requirement that the null functional be explicitly elicitable and have a strict identification function.
Power may be below fixed-sample tests in specific dependence structures or high-lag blocking (Casgrain et al., 2022, Henzi et al., 2021).
Tuning of betting fractions or mixture priors is critical for power; algorithmic learnability conditions are needed for some log-optimality results.

For particular tasks such as sequential forecast calibration or nonparametric two-sample testing, specialized construction and batching strategies provide empirical evidence of strong power and low Type I error in both simulated and real-data settings (Arnold et al., 2021, Pandeva et al., 2022).

7. Connections and Impact

This methodology unifies frequentist and Bayesian perspectives: e-values generalize p-values (via $1/E$ calibration) but permit optional continuation and combination; in the simple hypothesis case, e-values are likelihood ratios and coincide with Bayes factors under suitable priors (Ramdas et al., 2024, Grünwald et al., 2019). The interpretation via test supermartingales ties e-valuator procedures to game-theoretic probability and log-optimal portfolio constructions.

Sequential e-valuators underpin modern approaches to universal inference, multistage experimental design, adaptive clinical trials, online learning, and real-time decision systems. Their generality, statistical safety, and universality under optional stopping make them central tools for rigorous sequential inference in contemporary statistics and machine learning (Casgrain et al., 2022, Ramdas et al., 2024, Waudby-Smith et al., 3 Apr 2025, Pandeva et al., 2023).