Papers
Topics
Authors
Recent
2000 character limit reached

Sequential Hypothesis Testing E-valuators

Updated 9 January 2026
  • Sequential hypothesis testing-based e-valuators are anytime-valid statistical methods that use e-values to construct nonnegative supermartingales for rigorous Type I error control.
  • They dynamically adapt through online convex optimization and mixture strategies, optimizing sample complexity and achieving exponential power growth under alternative hypotheses.
  • This framework unifies frequentist and Bayesian inference, supporting applications in online learning, adaptive trials, and real-time decision systems.

Sequential hypothesis testing-based e-valuators are anytime-valid statistical procedures that evaluate hypotheses using streams of data by dynamically updating a process—usually a nonnegative supermartingale—constructed from so-called e-values or e-variables. These methods allow valid inference under optional stopping, adversarial data collection, and adaptivity, and are particularly powerful for nonparametric, composite, and online testing tasks. The e-value paradigm unifies testing, confidence sequences, and log-optimal betting strategies, with foundational results ensuring rigorous Type I error control, compatibility with online convex optimization, and systematic construction schemas for a wide variety of testing problems.

1. Mathematical Foundations of Sequential E-valuators

Let P\mathcal{P} denote a family of distributions over a sample space X\mathcal{X} and let T:PΘRdT:\mathcal{P}\rightarrow\Theta\subset\mathbb{R}^d be an elicitable and identifiable functional of interest (e.g., mean, quantile, expectile), specified via a strictly consistent scoring rule S:Θ×XRS:\Theta\times\mathcal{X}\rightarrow\mathbb{R} such that, for all PPP\in\mathcal{P}, the population risk

θEXPS(θ,X)\theta\mapsto \mathbb{E}_{X\sim P}S(\theta,X)

is uniquely minimized at θ=T(P)\theta=T(P) (Casgrain et al., 2022). The identification function V(θ,x)=θS(θ,x)V(\theta,x)=\nabla_\theta S(\theta,x) links the scoring rule to a vector-valued root characterization: T(P)T(P) satisfies EXP[V(θ,X)]=0\mathbb{E}_{X\sim P}[V(\theta,X)]=0.

A sequential e-value process (or e-process) is a nonnegative adapted process (Mn)n0(M_n)_{n\ge 0} in a filtration (Fn)n0(\mathcal{F}_n)_{n\ge 0} such that, under the null hypothesis H0: T(P)=θ0H_0:\ T(P) = \theta_0,

E[MnFn1]Mn1,M0=1.\mathbb{E}[M_n\mid\mathcal{F}_{n-1}]\le M_{n-1},\qquad M_0=1.

The one-step "betting" increments are constructed as

mn=1+ηnV(θ0,Xn),m_n = 1 + \eta_n V(\theta_0, X_n),

with ηn\eta_n a predictable sequence chosen to ensure mn0m_n\ge 0 (ηnV(θ0,Xn)>1\eta_n V(\theta_0,X_n) > -1). The process Mn=i=1nmiM_n = \prod_{i=1}^n m_i is a nonnegative supermartingale and thus a valid e-process (Casgrain et al., 2022, Ramdas et al., 2024).

More generally, for any sequential e-variables EnE_n adapted to Fn\mathcal{F}_{n} with E[EnFn1]1\mathbb{E}[E_n|\mathcal{F}_{n-1}]\le 1, the product Mn=i=1nEiM_n = \prod_{i=1}^n E_i forms an e-process (Arnold et al., 2021). This construction is central to the sequential hypothesis testing-based e-valuator paradigm.

2. Anytime-Validity and Type I Error Control

Sequential e-valuators provide rigorous Type I error control at any stopping time via Ville's inequality:

PH0(supn0Mn1/α)α\mathbb{P}_{H_0}\left(\sup_{n\ge 0}M_n\ge 1/\alpha\right) \le \alpha

for any α(0,1)\alpha\in(0,1) (Casgrain et al., 2022, Ramdas et al., 2024, Arnold et al., 2021). Consequently, the stopping rule

τ=inf{n1:Mn1/α}\tau = \inf\{n\ge 1 : M_n \ge 1/\alpha\}

defines an α\alpha-level anytime-valid sequential test. This property is robust to optional stopping, stopping for a reason, data-adaptive stopping, and even certain forms of adversarial data collection.

In composite or nonparametric settings, predictable mixture strategies over a family {Lθ}θΘ\{L^\theta\}_{\theta\in\Theta} of base supermartingales or e-values are often employed. Here, the e-process is

Mn=i=1nΘLiθLi1θπi(dθ),M_n = \prod_{i=1}^n \int_\Theta \frac{L_i^\theta}{L_{i-1}^\theta}\,\pi_i(d\theta),

with πi\pi_i a Fi1\mathcal{F}_{i-1}-measurable probability measure over Θ\Theta (Casgrain et al., 2022). Mixture-of-experts or online-convex-optimization (OCO) strategies select the mixing measure to adaptively maximize expected log-growth or minimize regret.

3. Design and Algorithmic Structure

The sequential e-valuator framework is algorithmically instantiated as follows (Casgrain et al., 2022, Ramdas et al., 2024):

  1. Initialization: Set M0=1M_0=1 and choose target significance α\alpha.
  2. Observation: For n=1,2,n=1,2,\ldots, observe new data XnX_n.
  3. Adaptation: Compute ηn\eta_n (or the parameter πn\pi_n for mixture strategies) as a function of X1,,Xn1X_1,\ldots,X_{n-1}.
  4. Increment Computation:

mn=1+ηnV(θ0,Xn)ormn=ΘLnθLn1θπn(dθ).m_n = 1 + \eta_n V(\theta_0, X_n) \quad \text{or} \quad m_n = \int_\Theta \frac{L_n^\theta}{L_{n-1}^\theta}\,\pi_n(d\theta).

  1. Update: Mn=Mn1mnM_n = M_{n-1}m_n.
  2. Stopping Rule: Halt and reject H0H_0 if Mn1/αM_n \ge 1/\alpha.

In higher-dimensional or non-bounded data settings, sub-ψ\psi cumulant generating function bounds allow construction of exponential supermartingales:

Ln(u)=exp(uYni=1nψ(u)),Yn=i=1nΔYi, ΔYi=V(θ0,Xi),L_n(u) = \exp\left(u Y_n - \sum_{i=1}^n \psi(u)\right), \qquad Y_n = \sum_{i=1}^n \Delta Y_i, \ \Delta Y_i = V(\theta_0, X_i),

and predictable mixtures over uu and θ\theta yield e-processes robust to sub-exponential or heavy-tailed data (Casgrain et al., 2022).

For broad composite nulls, the family of e-processes Mn(θ)M_n^{(\theta)} is often mixed with respect to a prior or adapted with online learning (Waudby-Smith et al., 3 Apr 2025, Ramdas et al., 2024). Portfolios of e-processes with sublinear "regret" versus the oracle best in hindsight yield universal log-optimality (Waudby-Smith et al., 3 Apr 2025).

4. Power Analysis and Regret Bounds

Under composite alternatives, sequential e-valuators achieve exponential power growth if there exists θ\theta^* such that

lim infn1nlogLnθ>0a.s.\liminf_{n\to\infty}\tfrac1n \log L_n^{\theta^*} > 0\quad \text{a.s.}

and if the regret,

$\mathrm{Regret}_n = \max_{\theta\in\Theta}\left(\log L_n^\theta - \log M_n\right), \qquad \text{(for mixture $M_n$)}$

is o(n)o(n) almost surely. Results from OCO (e.g., Follow-the-Leader, FTRL, OGD) guarantee sublinear regret and thus asymptotic power one (Casgrain et al., 2022, Waudby-Smith et al., 3 Apr 2025):

  • Asymptotic Growth: For log-optimal betting strategies,

limn1nlogMn=Q>0\lim_{n\to\infty} \frac{1}{n} \log M_n = \ell^*_Q > 0

almost surely under the alternative QQ, with Q\ell^*_Q the maximal expected log-increment (Waudby-Smith et al., 3 Apr 2025).

  • Stopping-Time Efficiency: The expected stopping time under any QQ satisfies

limα0EQ[τα]log(1/α)=1/Q,\lim_{\alpha\to 0} \frac{\mathbb{E}_Q[\tau_\alpha]}{\log(1/\alpha)} = 1/\ell^*_Q,

i.e., the test achieves optimal expected sample complexity, matching the pointwise Kelly rate (Waudby-Smith et al., 3 Apr 2025).

5. Applicability and Scope

Sequential hypothesis testing-based e-valuators, as developed in (Casgrain et al., 2022), cover

  • Means and mean ratios under boundedness or sub-Gaussian/sub-exponential conditions,
  • Quantiles and expectiles including VaR, multi-quantile vectors, and regression functionals,
  • Goodness-of-fit, forecast calibration, and probabilistic model assessment in nonparametric setups (Arnold et al., 2021, Henzi et al., 2021),
  • High-dimensional and ML-driven tests (e.g., deep classification two-sample, online independence and conditional independence) with nonparametric power via learned betting strategies (Pandeva et al., 2023, Pandeva et al., 2022).

The framework is also extensible to online and multiple testing settings—where e-values per hypothesis are multiplied or combined via closed testing or "discovery matrix" procedures for strong FWER/FDP control (Fischer et al., 2024, Vovk et al., 2020, Hartog et al., 15 Jan 2025).

6. Robustness, Extensions, and Practical Considerations

Robustness to unbounded or heavy-tailed data is handled via sub-ψ\psi bounds; e-processes can be constructed in exponential form and mixtures over parameter or tuning spaces can be performed using online convex optimization (Casgrain et al., 2022). The same formalism admits sequential testing with estimation, confidence sequences, and stopping-time statements.

Advantages include:

  • Anytime-valid error control: Type I error control is uniform over all (possibly data-adaptive) stopping rules;
  • Direct interpretability of evidence: The process MnM_n can be regarded as accumulated "betting capital" against H0H_0;
  • Flexible, adaptive power optimization: OCO or hedged mixtures allow minimax or log-optimal adaptivity over alternatives;
  • Compatibility with decision-theoretic and modern ML protocols, supporting test design via virtually any scoring rule or identification functional.

Prominent limitations and considerations are

  • The requirement that the null functional be explicitly elicitable and have a strict identification function.
  • Power may be below fixed-sample tests in specific dependence structures or high-lag blocking (Casgrain et al., 2022, Henzi et al., 2021).
  • Tuning of betting fractions or mixture priors is critical for power; algorithmic learnability conditions are needed for some log-optimality results.

For particular tasks such as sequential forecast calibration or nonparametric two-sample testing, specialized construction and batching strategies provide empirical evidence of strong power and low Type I error in both simulated and real-data settings (Arnold et al., 2021, Pandeva et al., 2022).

7. Connections and Impact

This methodology unifies frequentist and Bayesian perspectives: e-values generalize p-values (via $1/E$ calibration) but permit optional continuation and combination; in the simple hypothesis case, e-values are likelihood ratios and coincide with Bayes factors under suitable priors (Ramdas et al., 2024, Grünwald et al., 2019). The interpretation via test supermartingales ties e-valuator procedures to game-theoretic probability and log-optimal portfolio constructions.

Sequential e-valuators underpin modern approaches to universal inference, multistage experimental design, adaptive clinical trials, online learning, and real-time decision systems. Their generality, statistical safety, and universality under optional stopping make them central tools for rigorous sequential inference in contemporary statistics and machine learning (Casgrain et al., 2022, Ramdas et al., 2024, Waudby-Smith et al., 3 Apr 2025, Pandeva et al., 2023).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Sequential Hypothesis Testing-Based E-valuators.