Beta-Binomial Distribution

Updated 18 November 2025

Beta-binomial distribution is a compound model combining binomial outcomes with beta-distributed probabilities to naturally account for overdispersion.
It underpins Bayesian predictive inference by updating beta priors with observed binomial data, yielding closed-form posterior and predictive densities.
Extensions like the tilted beta-binomial and generalized score distribution enhance modeling flexibility for varied dispersion regimes and high-dimensional data.

The beta-binomial distribution is a compound discrete distribution arising when the probability parameter of a binomial distribution is treated as a random variable following a beta distribution. Its flexibility and analytic tractability make it a foundational tool for Bayesian inference, overdispersion modeling, credible region construction, and regression in applied statistics and machine learning.

1. Definition and Formal Properties

Given $Y \mid p \sim \mathrm{Binomial}(n,p)$ and $p \sim \mathrm{Beta}(\alpha, \beta)$ , the marginal distribution of $Y$ is the beta-binomial: $P(Y = k) = \binom{n}{k} \frac{B(k+\alpha, n-k + \beta)}{B(\alpha, \beta)}, \qquad k = 0, 1, \dots, n$ where $B(a, b) = \int_0^1 t^{a-1}(1-t)^{b-1}dt$ denotes the beta function (Cifuentes-Amado et al., 2019, Sharony, 2017).

The parameters $\alpha$ and $\beta$ serve as shape parameters for the underlying beta prior. The mean and variance of $Y$ are: $\begin{align*} \mathrm{E}[Y] &= n\,\frac{\alpha}{\alpha+\beta}\ \mathrm{Var}[Y] &= n\,\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}\,(n+\alpha+\beta) \end{align*}$ The variance exceeds that of the binomial distribution with the same mean unless $\alpha+\beta \rightarrow \infty$ , capturing overdispersion. Higher factorial moments take the form: $E[Y(Y-1)\cdots(Y-r+1)] = n_{(r)}\,\frac{(\alpha)_{(r)}}{(\alpha+\beta)_{(r)}}$ where $n_{(r)} = n(n-1)\cdots(n-r+1)$ and $(\alpha)_{(r)}$ denotes the Pochhammer symbol (Cifuentes-Amado et al., 2019).

2. Bayesian Predictive Inference and Posterior Updating

The beta-binomial arises naturally as the posterior predictive for a future binomial count when Bayesian inference is performed with a beta prior. Given observed data $X = x$ from $n$ trials and parameters $(\alpha, \beta)$ , the posterior on $p$ is $\mathrm{Beta}(x+\alpha, n-x+\beta)$ . The predictive density for a new sample $Y$ of $m$ trials is: $f(y | x) = \binom{m}{y} \frac{B(x+\alpha + y, n-x + \beta + m - y)}{B(x+\alpha, n-x+\beta)}$ (Hamura, 2021, Westphal, 2019).

The Bayesian estimator for $p$ , under entropy loss, is the posterior mean: $\delta_U(x) = \frac{x+\alpha}{n+\alpha+\beta}$

Truncated beta priors (with either upper or two-sided support restrictions) yield analogous expressions involving incomplete beta functions and ratios of incomplete integrals for posterior and predictive calculations (Hamura, 2021).

3. Overdispersion, Extensions, and Regression Models

The standard beta-binomial models overdispersion through the stochasticity of $p$ . Alternative formulations extend this property:

Tilted beta-binomial: Employs a convex mixture of a mean-tilted polynomial density and a beta law for $p$ . The marginal PMF is:

$P(Y = y) = \theta P_{\text{tilt}}(y \mid m, \mu_t) + (1-\theta) P_{\mathrm{BB}}(y \mid m, \mu_b, \phi)$

where $P_{\mathrm{BB}}$ is the standard beta-binomial term, and $P_{\mathrm{tilt}}$ derives from a polynomial-tilted law (Cifuentes-Amado et al., 2019). This model can be embedded in GLM-style regression with links on $\mu_b$ , $\phi$ , and $\theta$ , and estimated via MCMC methods.

Generalised Score Distribution (GSD): Provides an underdispersed continuation of the beta-binomial by parameterizing the mean and variance to fully cover the feasible mean-variance space on finite discrete supports. The regime $0 \leq \rho < C(\psi)$ recovers a reparameterized beta-binomial; for $\rho \geq C(\psi)$ , the GSD uses a mixture of binomial and deterministic components, enabling modeling of both over- and under-dispersion beyond the beta-binomial's variance floor (Ćmiel et al., 2022).

Model	Dispersion Regime	Extra Parameters
Beta-binomial	Overdispersion	$(\alpha, \beta)$
Tilted beta-binomial	Enhanced overdisp.	$(\mu_t, \mu_b, \phi, \theta)$
GSD	Under/Overdisp.	$(\psi, \rho)$

4. Multivariate Beta-Binomial and High-Dimensional Inference

The multivariate beta-binomial (Westphal, 2019) models $m$ correlated binomial marginals by inducing an $m$ -dimensional beta distribution as the push-forward of a $2^m$ -dimensional Dirichlet. The marginal distributions are beta-binomial, but with controlled correlations: $Y_j = \sum_{i=1}^n X_{i,j}, \; \text{with marginals} \; \theta_j \sim \mathrm{Beta}(\alpha_j, \beta_j)$ Posterior updates and credible region construction utilize either the full Dirichlet structure or a reduced parameterization with first and second moments. For $m > 10$ , computational tractability favors the latter. Gaussian copulas are used to approximate marginals and joint distribution, outperforming normal approximations for small samples or many dimensions and ensuring credible regions remain within the lawful parameter space.

5. Risk, Dominance, and Predictive Performance

Under Kullback–Leibler divergence loss, the total risk for the beta-binomial Bayes predictive decomposes into a sum of one-step-ahead sub-risks (Hamura, 2021): $R_{m,n}(p, \pi) = \sum_{i=0}^{m-1} R_{1, n+i}(p, \pi)$ For restricted parameter regions, truncated beta priors can yield predictive estimators with uniformly lower risk than the unrestricted beta prior, subject to conditions on the truncation region (e.g., for $p \in(0,\kappa]$ when $\kappa < (n+\alpha)/(n+\alpha+\beta)$ ). This dominance is evidenced both theoretically and numerically, with similar conditions for two-sided truncation (Hamura, 2021).

6. Algorithmic, Computational, and Application Contexts

In sequential decision-making, the beta-binomial is central for online optimization scenarios such as multi-armed bandit problems, providing analytic expressions (in terms of ${}_2F_1$ hypergeometric functions) for comparing payout rates between stochastic processes (Sharony, 2017). Compared to brute force Monte Carlo, such analytic reductions offer orders-of-magnitude speedups at the same numerical precision.

Estimation for beta-binomial and GSD models can be handled by the method of moments, maximum likelihood, or Bayesian approaches. For GSD, explicit formulas for MLE and moment equations enable efficient parameter recovery; for regression and hierarchical modeling, standard MCMC packages such as OpenBUGS are used, leveraging closed-form PMFs (Cifuentes-Amado et al., 2019, Ćmiel et al., 2022).

The beta-binomial forms a fundamental predictive tool in Bayesian statistics, supports robust inference under uncertainty and overdispersion, and, through its extensions, provides comprehensive modeling of discrete count data in both univariate and multivariate high-dimensional settings.