Kelly Betting as Bayesian Model Evaluation
- The paper introduces Kelly betting as a principled method for real-time Bayesian model evaluation, equating wealth growth with posterior credibility updates.
- It demonstrates the equivalence between optimal wagering, minimizing cumulative log-loss, and Bayesian updating through analytic links to KL divergence and regret bounds.
- The method supports online updating and market consensus formation using both full and fractional Kelly betting, leading to enhanced model selection accuracy.
Kelly betting as Bayesian model evaluation provides a mathematically rigorous framework for real-time, sequential assessment of probabilistic forecasting models. By treating each model or agent as a Kelly bettor and interpreting their evolving bankrolls as Bayesian credibilities, this approach unifies prediction-market dynamics, strictly proper scoring, information-theoretic optimality, and Bayesian model averaging. It yields analytic connections between log-loss, Kullback-Leibler divergence, and the rate at which the best model can be distinguished from suboptimal alternatives, while also supporting online updating and market consensus formation (Beygelzimer et al., 2012, &&&1&&&).
1. Mathematical Foundations and Setup
Consider competing models forecasting a sequence of binary outcomes for . Each model outputs an updated predictive probability . Each model is assigned a bankroll (credibility) , initialized to the prior with normalization (Beuoy, 10 Feb 2026).
At each round, models bet as Kelly agents against a “market” consensus probability , with bets fractions given by the classical Kelly formula: Once the outcome is revealed, each model’s bankroll is updated multiplicatively: or, equivalently,
(Beygelzimer et al., 2012, Beuoy, 10 Feb 2026).
2. Equivalence to Bayesian Model Evaluation
The growth of implements exact Bayesian updating for model credibility: This alignment is seen by noting and that the “market” aggregates model forecasts into . The ratio exactly matches the posterior odds
This shows that Kelly betting yields the same sequential model evidence as Bayesian filtering, with bankrolls serving as normalized posterior credibilities at every time step (Beygelzimer et al., 2012, Beuoy, 10 Feb 2026).
3. Market Aggregation, Log-Loss, and Regret
At equilibrium, market price is the consensus forecast
with interpreted as normalized model credibilities. The incremental log-growth for model satisfies: where is the log-loss. Thus, maximizing log-bankroll aligns with minimizing cumulative log-loss against the market mixture (Beuoy, 10 Feb 2026).
The expected excess growth rate is given by the negative KL divergence between the true data-generating distribution and model : A worst-case log regret bound follows via wealth conservation in prediction markets: after rounds,
where is the market log-loss, is that for agent , and is the prior wealth. This is the Bayesian model-evidence penalty term for expert (Beygelzimer et al., 2012).
4. Posterior Evolution and Beta-Binomial Dynamics
Given a data stream $y_1, ..., y_T \iid \operatorname{Bernoulli}(\pi)$, assign initial wealth and beliefs . The market price sequence updates exactly as the posterior mean of a Beta-Binomial model: Thus, the market price acts as the posterior predictive mean, reflecting aggregate learning as in Bayesian inference (Beygelzimer et al., 2012).
5. Fractional Kelly, Tempered Posteriors, and Discounting
Fractional Kelly betting generalizes the approach by scaling bet size to a confidence parameter : A bettor using fractional Kelly acts as a full Kelly bettor on the tempered belief . The consensus price becomes
When all , this implements a market tracking a discounted Bernoulli process, with the price converging to a time-discounted frequency. Empirically, this yields where
This provides a probabilistic interpretation for fractional Kelly betting and ties it to credibility discounting (Beygelzimer et al., 2012).
6. Empirical Performance and Metric Comparison
In simulation studies involving binary outcome sequences (e.g., “volleyball" matches to 100 points), Kelly-Bayes evaluation is compared to log-loss and Brier score for the task of model selection:
- When the alternative model uses an incorrect but fixed , Kelly selects the true model more often than log-loss/Brier (e.g., 55% vs 50%).
- For alternatives with recency bias, Kelly achieves substantially higher model-picking accuracy (96% vs 73% for log-loss).
- For alternatives with random drift, Kelly outperforms log-loss (74% vs 58%).
Over repeated matches, when the ending bankroll is carried over as prior, Kelly-Bayes quickly outperforms and dominates these classical metrics in terms of model-selection accuracy (Beuoy, 10 Feb 2026).
7. Real-Time Implementation and Generalizations
At each time step, the market consensus is computed as the bankroll-weighted average of model forecasts: Bankroll is updated as above, and normalization ensures retains the interpretation as posterior credibility. In the multinomial (multi-outcome) case, market-clearing requires solving the eigenvector equation , where encodes model outcome probabilities and is hypothetical terminal wealth.
Pseudocode for the binary case is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
B[i] = prior[i] # for i = 1...K, sum B = 1 for t in 1...T: # 1. Read forecasts q = sum(B[i] * p[i] for i in 1...K) # 2. Compute Kelly fractions for i in 1...K: b[i] = (p[i] - q) / (1 - q) # 3. Observe outcome o ∈ {0,1} for i in 1...K: if o == 1: B[i] *= (1 + b[i] * (1/q - 1)) else: B[i] *= (1 - b[i]) # 4. Normalize Z = sum(B) for i in 1...K: B[i] /= Z |
Conclusion
Kelly betting yields a formal equivalence between wealth maximization via optimal sequential wagering and Bayesian model evaluation. In both theoretical and empirical terms, this approach implements real-time, order-aware, and posterior-consistent updates of model credibility, recovers traditional Bayesian principles in aggregate, and supports discounted or tempered updates via fractional Kelly. Market prices correspond to posterior-predictive means, and the worst-case regret bounds have the interpretation of Bayesian model-selection penalties. Kelly-based Bayesian evaluation thus provides a principled alternative to classical scoring rules for sequential, real-time model assessment and aggregation (Beygelzimer et al., 2012, Beuoy, 10 Feb 2026).