Papers
Topics
Authors
Recent
2000 character limit reached

Beta-Binomial Distribution

Updated 18 November 2025
  • Beta-binomial distribution is a compound model combining binomial outcomes with beta-distributed probabilities to naturally account for overdispersion.
  • It underpins Bayesian predictive inference by updating beta priors with observed binomial data, yielding closed-form posterior and predictive densities.
  • Extensions like the tilted beta-binomial and generalized score distribution enhance modeling flexibility for varied dispersion regimes and high-dimensional data.

The beta-binomial distribution is a compound discrete distribution arising when the probability parameter of a binomial distribution is treated as a random variable following a beta distribution. Its flexibility and analytic tractability make it a foundational tool for Bayesian inference, overdispersion modeling, credible region construction, and regression in applied statistics and machine learning.

1. Definition and Formal Properties

Given YpBinomial(n,p)Y \mid p \sim \mathrm{Binomial}(n,p) and pBeta(α,β)p \sim \mathrm{Beta}(\alpha, \beta), the marginal distribution of YY is the beta-binomial: P(Y=k)=(nk)B(k+α,nk+β)B(α,β),k=0,1,,nP(Y = k) = \binom{n}{k} \frac{B(k+\alpha, n-k + \beta)}{B(\alpha, \beta)}, \qquad k = 0, 1, \dots, n where B(a,b)=01ta1(1t)b1dtB(a, b) = \int_0^1 t^{a-1}(1-t)^{b-1}dt denotes the beta function (Cifuentes-Amado et al., 2019, Sharony, 2017).

The parameters α\alpha and β\beta serve as shape parameters for the underlying beta prior. The mean and variance of YY are: E[Y]=nαα+β Var[Y]=nαβ(α+β)2(α+β+1)(n+α+β)\begin{align*} \mathrm{E}[Y] &= n\,\frac{\alpha}{\alpha+\beta}\ \mathrm{Var}[Y] &= n\,\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}\,(n+\alpha+\beta) \end{align*} The variance exceeds that of the binomial distribution with the same mean unless α+β\alpha+\beta \rightarrow \infty, capturing overdispersion. Higher factorial moments take the form: E[Y(Y1)(Yr+1)]=n(r)(α)(r)(α+β)(r)E[Y(Y-1)\cdots(Y-r+1)] = n_{(r)}\,\frac{(\alpha)_{(r)}}{(\alpha+\beta)_{(r)}} where n(r)=n(n1)(nr+1)n_{(r)} = n(n-1)\cdots(n-r+1) and (α)(r)(\alpha)_{(r)} denotes the Pochhammer symbol (Cifuentes-Amado et al., 2019).

2. Bayesian Predictive Inference and Posterior Updating

The beta-binomial arises naturally as the posterior predictive for a future binomial count when Bayesian inference is performed with a beta prior. Given observed data X=xX = x from nn trials and parameters (α,β)(\alpha, \beta), the posterior on pp is Beta(x+α,nx+β)\mathrm{Beta}(x+\alpha, n-x+\beta). The predictive density for a new sample YY of mm trials is: f(yx)=(my)B(x+α+y,nx+β+my)B(x+α,nx+β)f(y | x) = \binom{m}{y} \frac{B(x+\alpha + y, n-x + \beta + m - y)}{B(x+\alpha, n-x+\beta)} (Hamura, 2021, Westphal, 2019).

The Bayesian estimator for pp, under entropy loss, is the posterior mean: δU(x)=x+αn+α+β\delta_U(x) = \frac{x+\alpha}{n+\alpha+\beta}

Truncated beta priors (with either upper or two-sided support restrictions) yield analogous expressions involving incomplete beta functions and ratios of incomplete integrals for posterior and predictive calculations (Hamura, 2021).

3. Overdispersion, Extensions, and Regression Models

The standard beta-binomial models overdispersion through the stochasticity of pp. Alternative formulations extend this property:

  • Tilted beta-binomial: Employs a convex mixture of a mean-tilted polynomial density and a beta law for pp. The marginal PMF is:

    P(Y=y)=θPtilt(ym,μt)+(1θ)PBB(ym,μb,ϕ)P(Y = y) = \theta P_{\text{tilt}}(y \mid m, \mu_t) + (1-\theta) P_{\mathrm{BB}}(y \mid m, \mu_b, \phi)

where PBBP_{\mathrm{BB}} is the standard beta-binomial term, and PtiltP_{\mathrm{tilt}} derives from a polynomial-tilted law (Cifuentes-Amado et al., 2019). This model can be embedded in GLM-style regression with links on μb\mu_b, ϕ\phi, and θ\theta, and estimated via MCMC methods.

  • Generalised Score Distribution (GSD): Provides an underdispersed continuation of the beta-binomial by parameterizing the mean and variance to fully cover the feasible mean-variance space on finite discrete supports. The regime 0ρ<C(ψ)0 \leq \rho < C(\psi) recovers a reparameterized beta-binomial; for ρC(ψ)\rho \geq C(\psi), the GSD uses a mixture of binomial and deterministic components, enabling modeling of both over- and under-dispersion beyond the beta-binomial's variance floor (Ćmiel et al., 2022).
Model Dispersion Regime Extra Parameters
Beta-binomial Overdispersion (α,β)(\alpha, \beta)
Tilted beta-binomial Enhanced overdisp. (μt,μb,ϕ,θ)(\mu_t, \mu_b, \phi, \theta)
GSD Under/Overdisp. (ψ,ρ)(\psi, \rho)

4. Multivariate Beta-Binomial and High-Dimensional Inference

The multivariate beta-binomial (Westphal, 2019) models mm correlated binomial marginals by inducing an mm-dimensional beta distribution as the push-forward of a 2m2^m-dimensional Dirichlet. The marginal distributions are beta-binomial, but with controlled correlations: Yj=i=1nXi,j,  with marginals  θjBeta(αj,βj)Y_j = \sum_{i=1}^n X_{i,j}, \; \text{with marginals} \; \theta_j \sim \mathrm{Beta}(\alpha_j, \beta_j) Posterior updates and credible region construction utilize either the full Dirichlet structure or a reduced parameterization with first and second moments. For m>10m > 10, computational tractability favors the latter. Gaussian copulas are used to approximate marginals and joint distribution, outperforming normal approximations for small samples or many dimensions and ensuring credible regions remain within the lawful parameter space.

5. Risk, Dominance, and Predictive Performance

Under Kullback–Leibler divergence loss, the total risk for the beta-binomial Bayes predictive decomposes into a sum of one-step-ahead sub-risks (Hamura, 2021): Rm,n(p,π)=i=0m1R1,n+i(p,π)R_{m,n}(p, \pi) = \sum_{i=0}^{m-1} R_{1, n+i}(p, \pi) For restricted parameter regions, truncated beta priors can yield predictive estimators with uniformly lower risk than the unrestricted beta prior, subject to conditions on the truncation region (e.g., for p(0,κ]p \in(0,\kappa] when κ<(n+α)/(n+α+β)\kappa < (n+\alpha)/(n+\alpha+\beta)). This dominance is evidenced both theoretically and numerically, with similar conditions for two-sided truncation (Hamura, 2021).

6. Algorithmic, Computational, and Application Contexts

In sequential decision-making, the beta-binomial is central for online optimization scenarios such as multi-armed bandit problems, providing analytic expressions (in terms of 2F1{}_2F_1 hypergeometric functions) for comparing payout rates between stochastic processes (Sharony, 2017). Compared to brute force Monte Carlo, such analytic reductions offer orders-of-magnitude speedups at the same numerical precision.

Estimation for beta-binomial and GSD models can be handled by the method of moments, maximum likelihood, or Bayesian approaches. For GSD, explicit formulas for MLE and moment equations enable efficient parameter recovery; for regression and hierarchical modeling, standard MCMC packages such as OpenBUGS are used, leveraging closed-form PMFs (Cifuentes-Amado et al., 2019, Ćmiel et al., 2022).

The beta-binomial forms a fundamental predictive tool in Bayesian statistics, supports robust inference under uncertainty and overdispersion, and, through its extensions, provides comprehensive modeling of discrete count data in both univariate and multivariate high-dimensional settings.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Beta-Binomial Distribution.