Testing-by-Betting Framework
- Testing-by-Betting Framework is a sequential, game-theoretic method that transforms hypothesis testing into a wealth accumulation game against the null model.
- It uses nonnegative martingales, known as e-processes, to quantify evidence and guarantee anytime-valid, nonasymptotic type-I error control.
- The framework’s adaptability supports applications in clinical trials, nonparametric testing, and ML model auditing through online learning and flexible betting strategies.
The testing-by-betting framework is a sequential, game-theoretic paradigm for hypothesis testing and interval estimation, in which statistical inference is recast as a wealth accumulation game between a bettor and the null (or composite) model. Instead of relying on fixed-sample p-values or likelihood ratios alone, the core object is a nonnegative martingale or supermartingale—often called an e-process—whose growth quantifies evidence against the null and whose trajectories support anytime-valid inference. This approach allows for flexible adaptivity, optional stopping, online decision-making, and interpretable, information-theoretic summaries of evidence, while guaranteeing strong and nonasymptotic type-I error control (Shafer, 2023, Greenland, 2021, Chen et al., 2024). The spread of this methodology now encompasses parametric and nonparametric testing, Monte Carlo tests, exchangeability, conditional independence, adaptive two-sample testing, auditing ML models, clinical trial monitoring, and extensions to borrowing and bargaining scenarios.
1. Foundations: Game-Theoretic Probability and Martingales
The framework is rooted in the fundamental principle that evidence against a forecaster (null model) is measured by the success of a sequence of bets, formalized as a capital (wealth) process (Sₜ or Wₜ) updated according to the bettor’s stakes and the realized outcomes (Shafer, 2023, Greenland, 2021). For example, given a filtration ℱₜ (encapsulating available information), a typical wealth process is recursively defined as
where gₜ is a real-valued payoff (e.g., a score or contrast reflecting the discrepancy between observed and expected behavior under the null), and λₜ is a predictable (ℱₜ₋₁-measurable) betting fraction constrained to ensure nonnegativity.
If is a nonnegative martingale or supermartingale under the null, Ville’s inequality provides exact, time-uniform control: for any α ∈ (0,1). This directly yields a sequential level-α test: reject H₀ as soon as the wealth process crosses 1/α.
Importantly, the betting process may adapt to history, incorporate online learners, or be aggregated over functions, but validity is assured as long as the martingale property is respected (Shaer et al., 2022, Pandeva et al., 2023, Shekhar et al., 2021).
2. Core Elements: E-Values, Surprisal Scores, and Their Interpretations
E-processes and e-values: The process (Wₜ) is often called an e-process, with each Wₜ an e-value: under H₀, , and quantifies the extremity of evidence as a betting score (Shafer, 2023, Zampieri, 4 Dec 2025). Reaching Wₜ = k can be interpreted as having k:1 odds "against" the null.
Surprisal (S-value): Transformations such as the surprisal, , are motivated by information theory and encode the self-information in bits (base 2) or nats (base e) (Greenland, 2021). For a uniform p-value under H₀, the surprisal is Exp(1)-distributed, making S-values naturally calibrated fair betting scores under the null. The use of the S-value connects statistical evidence to intuition (e.g., “as surprising as 7 heads in 7 tosses” for 7 bits against H₀).
Adaptation and Interpretability: The selection of betting scores may appear arbitrary, but rationales based on information content, interpretability, and historical mathematical context (e.g., Shannon information, decision relevance) can justify specific choices over others (Greenland, 2021).
3. Methodological Flexibility: Sequential, Adaptive, and Nonparametric Extensions
The framework extends naturally to a wide range of settings:
- Randomization-based and nonparametric clinical trials: The e-process is constructed by betting on treatment assignments given observed outcomes, with the martingale property guaranteed by the randomization mechanism, providing anytime-valid type I error control regardless of monitoring strategy (Zampieri, 4 Dec 2025).
- Two-sample, invariance, and independence testing: The bettor selects payoff functions (e.g., from integral probability metrics or reproducing kernel Hilbert spaces) with the goal of maximizing expected wealth growth under alternatives. Online learning (e.g., gradient ascent, Online Newton Step) is used to adapt bets in real time, connecting power and expected stopping time to regret bounds in online convex optimization (Pandeva et al., 2023, Shekhar et al., 2021, Chen et al., 11 Feb 2025).
- Monte Carlo permutation testing and resampling: The betting score accumulates over sequential draws, recovering and generalizing optimal stopping rules (e.g., Besag-Clifford), while yielding anytime-valid p-values and e-values and providing closed-form expressions for wealth under binomial or binomial-mixture strategies (Fischer et al., 2024).
- Conditional independence and model-X knockoff settings: By leveraging knockoff samples and symmetry-based payoffs, sequential betting controls type I error for testing conditional independence with arbitrary dependence structures and ML-based scoring (Shaer et al., 2022, Teneggi et al., 2024).
- Heterogeneous data and active sampling: Adaptive strategies, such as greedy or ε-greedy source selection, are embedded within the betting game to accelerate discrimination between distributions by focusing on informative data sources, all under nonparametric and distribution-free conditions (Hsu et al., 26 Dec 2025).
4. Statistical Properties: Error Control, Power, and Confidence Sequences
Type I error and admissibility: The core guarantee is exact nonasymptotic level-α control at any stopping time, even under data-dependent or optional stopping, including optional continuation with new data (Shafer, 2023, Zampieri, 4 Dec 2025, Duan et al., 2020). No correction for multiple testing over time is required.
Consistency and power: For a broad class of alternatives, the wealth process diverges with probability one, leading to "power one” against fixed alternatives. Expected sample size to rejection and the rate of wealth growth are tightly tied to problem difficulty (signal-to-noise, IPM distances), online regret, and adaptivity of the betting strategy (Shekhar et al., 2021, Chen et al., 11 Feb 2025).
Confidence sequences: By inverting parallel betting tests for all candidate parameter values or for composite nulls, one constructs time-uniform confidence sequences for the parameter of interest, with coverage at all times (Waudby-Smith et al., 2020, Shafer, 2023).
5. Practical Implementation: Algorithms, Special Cases, and Illustration
Algorithms and optimization: Effective betting fractions typically require online convex optimization algorithms (e.g., ONS, FTRL+barrier, optimistic variants) to rapidly grow wealth under H₁ (Chen et al., 11 Feb 2025). Self-concordant barriers allow safe, unrestricted bet updates, improving efficiency relative to conservative projections (Chen et al., 11 Feb 2025, Chen et al., 2024).
Examples:
- In sequential Monte Carlo testing, closed-form wealth updates and binomial-mixture betting recover classical p-values and control resampling risk with minimal computation (Fischer et al., 2024).
- In clinical trials, the e-process can be constructed nonparametrically from randomization alone and provides robust early stopping procedures (Zampieri, 4 Dec 2025).
- In fairness monitoring or interaction rank tests, the game-theoretic approach supports covariate/adaptive ranking, direct interpretability, and seamless incorporation of online learning (Chugg et al., 2023, Duan et al., 2020).
6. Theoretical Innovations: Exchangeability, Filtration, Borrowing, and Bargaining
Testing exchangeability: Recent work demonstrates that naive single-observation betting is ineffective for exchangeability, but pairing data and betting on the order of pairs yields a valid and powerful martingale test, achieving power-one in a wide class (Markov, AR(1), general ergodic) (Saha et al., 2023). The key is a minor adjustment to the filtration (shrinking to pairs), unlocking martingale structure and error control.
Borrowing and bargaining: Extensions allow the bettor to borrow after bankruptcy (incurring liabilities) or "bargain" for better odds. Net and gross wealth processes can still be controlled under explicit liabilities and interest conditions, though Type I error is inflated according to borrowing and odds mispricing (Wang et al., 2024). Adjusted evidence metrics (e.g., net wealth, discounted gross wealth) are precisely calibrated so that large e-values remain rare under null, provided borrowing is bounded and interest is correctly charged.
| Variant | Key Formula/Rule | Type-I Control Condition |
|---|---|---|
| Classical betting | Nonneg. martingale, Ville: | |
| Borrowing allowed | If , control at $1+L$ | |
| Bargaining (odds ) | Discount by ; inflation by |
7. Impact, Applications, and Future Directions
Impact and scope: The testing-by-betting framework unifies, generalizes, and strengthens sequential inference methodology across disciplines. Its adoption in statistical testing, ML model auditing, sequential clinical monitoring, adaptive trial design, and streaming data analysis demonstrates its broad utility (Pandeva et al., 2023, Chugg et al., 2023, Zampieri, 4 Dec 2025). Empirical studies consistently show type-I error control, rapid evidence accumulation, and power competitive with or surpassing fixed-sample baselines across synthetic, clinical, and real-world ML benchmark tasks (Chen et al., 11 Feb 2025, Chen et al., 2024, Chugg et al., 2023).
Open directions: Ongoing research targets sharper regret bounds for online bet selection, robust methods for model misspecification and distribution shift, refinement of e-value combination in multiple testing, further extensions to composite nulls, and operationalization in scientific workflows. The interpretive clarity and optional monitoring provided by the betting paradigm are expected to see further diffusion into statistical practice.
Controversies and misconceptions: Common points of confusion include the notion that wealth growth is “arbitrary” due to freedom in bet construction. However, as shown by information-theoretically justified scores and robust calibration properties (Greenland, 2021), careful design permits decision-relevant and interpretable evidence reporting. The requirement that bets never exceed available capital ensures that optional stopping does not invalidate error control—this is not a limitation but a core strength enabling valid adaptation and continuation (Shafer, 2023, Chen et al., 2024).
Summary: The testing-by-betting framework provides a robust, nonasymptotic, and flexible foundation for sequential statistical inference, simultaneously offering strong error control, interpretability, adaptivity, and practical tractability across diverse statistical problems (Shafer, 2023, Greenland, 2021, Chen et al., 2024, Zampieri, 4 Dec 2025, Pandeva et al., 2023, Shekhar et al., 2021, Chen et al., 11 Feb 2025, Chen et al., 2024, Waudby-Smith et al., 2020, Shaer et al., 2022, Saha et al., 2023, Hsu et al., 26 Dec 2025, Chugg et al., 2023, Fischer et al., 2024, Duan et al., 2020, Wang et al., 2024).