Testing by Betting: A Statistical Framework
- Testing by Betting is a framework that reinterprets hypothesis testing as a sequential betting game, using an e-process to quantify evidence against a null hypothesis.
- The method leverages martingale theory and game-theoretic probability to achieve anytime risk control, overcoming limitations of traditional p-value approaches.
- Extensions address borrowing and mispriced odds by normalizing wealth processes, ensuring that even with leverage, statistical evidence remains valid.
Testing by Betting is a general statistical framework that interprets hypothesis testing as a dynamic betting game, transforming the evaluation of evidence into the language of financial wagers and wealth processes. The central object is an e-process (betting score), which quantifies evidence against a null hypothesis by tracking the cumulative returns of a sequential betting strategy. The core methodology exploits deep connections to martingale theory and game-theoretic probability, enabling uniformly valid, anytime risk control—properties unobtainable by classical p-value methodology. This paradigm has recently been extended to model the effects of leverage (“borrowing”) and “bargaining” for mispriced odds, which pose new trade-offs for the authenticity and interpretation of statistical evidence (Wang et al., 2024).
1. Foundations of Testing by Betting
At its core, testing by betting formalizes statistical evidence as the capital accumulated by a gambler betting sequentially against a null. At time , the process observes data adapted to a filtration . Under the null , we require for predictable . A bet is -measurable; wagering on results in a payoff 0. The total “wealth” evolves as 1. Provided 2 remains nonnegative, this process is a (local) martingale under 3.
The observed wealth 4 is an e-value: large 5 indicates strong evidence against 6. Ville's inequality provides a nonasymptotic guarantee:
7
so the rule “reject 8 at the first 9 for which 0” controls Type I error at level 1 at all stopping times.
An e-process is a nonnegative adapted process 2 with 3 for all 4. E-processes generalize p-values to allow “anytime” (optional stopping) guarantees and are typically constructed via suitable betting strategies (Shafer, 2019, Waudby-Smith et al., 2020).
2. Classical No-Borrowing and its Limitations
The canonical constraint in betting-based statistical tests is “no-borrowing”: at step 5, the gambler is forbidden from staking more than their current wealth, i.e., 6, ensuring 7 always. This constraint guarantees that wealth processes are nonnegative martingales or supermartingales, foundational for the validity of tail probabilities and Type I error control via Ville’s inequality.
However, this restriction imposes strict sequential limitations: if 8 ever hits zero, the test cannot proceed. In contrast, in quantitative finance, the use of leverage (betting with borrowed capital) is pervasive. This motivates the question: how do sequential betting-based tests behave once the no-borrowing constraint is relaxed (Wang et al., 2024)?
3. Extensions: Borrowing, Liabilities, and Net Wealth
The extension to allowing borrowing modifies the process dynamics:
- Let 9 be the borrowed funds at round 0 (1 is the cumulative liability).
- The gambler’s gross wealth (before debts are repaid) is 2 with 3 for fair double-or-nothing odds on 4.
- The net wealth is 5.
Importantly, the net-wealth process 6 is a martingale under 7 and the gross wealth decomposes as 8 (Doob decomposition).
Statistical guarantees are maintained in the borrowed setting under certain controls:
- Bounded Liabilities: If the expected liability satisfies 9, then the scaled process 0 is an e-process, so the probability of excess evidence is inflated by a factor 1.
- Lower-bounded Net Wealth: If 2 uniformly, then 3 is an e-process, controlling Type I error depending on the minimum allowed net wealth.
Borrowing-based strategies can always be decomposed as convex combinations (“episodes”) of standard (no-borrowing) e-values (Wang et al., 2024).
4. Leverage Invariance and the Futility of Unconstrained Borrowing
A critical finding is leverage invariance: for any random variable 4, transforming 5 by a leverage map 6, and standardizing it to match e-value properties, does not increase the maximal achievable evidence under any alternative. All borrow-adjusted e-values are equivalent, in maximal expectation, to standard (no-borrowing) e-values after proper normalization.
This means that systematic “tricks” with leverage or variable borrowing cannot outpace the best possible evidence achievable via conventional, no-borrowing betting strategies. Intuitive attempts to “turbocharge” sequential tests by exploiting borrowing are rendered futile after proper adjustment for liabilities.
5. Bargaining and Arbitrage: Mispriced Odds and Interest
When the “odds” offered in the betting game are mispriced in the statistician’s favor (e.g., a payoff factor 7 on a correct bet for some 8), risk-free arbitrage becomes possible. This artificially inflates the gambler’s wealth even under the null, introducing misleading or unbounded evidence.
To compensate, one divides the gross wealth 9 by the product of all mispricing factors—i.e., computes the “numéraire-adjusted” wealth 0—which restores an (almost) supermartingale structure (Wang et al., 2024). A maximal inequality shows that unless the sum 1 is vanishingly small, the claimed evidence cannot be interpreted as statistical, since arbitrage dominates.
Interest-accrual on loans (charging the same rate 2 as the mispricing at each step) realigns the statistical and financial notion of fair evidence: only then does net wealth, properly adjusted, regain the true martingale property, restoring correct e-process interpretation.
6. Implications, Pathologies, and Statistical Principles
Testing by betting offers several crucial advantages:
- Anytime validity: E-processes guarantee Type I control regardless of stopping rules, unlike p-values, which only control level at a prespecified sample size (Shafer, 2019, Waudby-Smith et al., 2020).
- Multiplicative accumulation: Evidence builds multiplicatively by default, in contrast to the shrinking properties of p-values under repeated testing.
- Weak assumptions: The martingale property and e-process definition often hold under minimal parametric or structural requirements; they are robust to misspecification, adversarial manipulation, and, with appropriate adjustment, even to borrowing and mispriced odds.
However, unrestrained borrowing or acceptance of mispriced odds without proper interest adjustments fundamentally undermines the validity of betting-based evidence. In such cases, the e-process may report spurious evidence due to pure arbitrage rather than statistical truth. All valid extensions must invoke normalization or constraint factors (such as 3 for liabilities, or numéraire adjustment for odds) to maintain statistical meaning.
As concrete illustration, in a biased coin test with betting at the Kelly-optimal fraction, the gambler's log-wealth typically grows at rate 4 under the alternative 5; the corresponding p-value decays at this rate, making e-values exponentially more informative in the presence of signal.
7. Summary Table: Core Objects in Borrowing/Bargaining Extensions
| Object | Mathematical Definition | Statistical Role |
|---|---|---|
| Gross wealth | 6 | Measures cumulative evidence (adjusted for debt) |
| Liabilities | 7 | Total amount borrowed |
| Net wealth | 8 | True value after debts |
| Debt-adjusted e-value | 9, 0 | Controls Type I error at adjusted levels |
| Numéraire-adjusted | 1 | Restores supermartingale after mispricing |
Debt- and arbitrage-adjusted statistics are indispensable for maintaining interpretability and risk control in applications of testing by betting that go beyond the classical no-borrowing regime (Wang et al., 2024).