Papers
Topics
Authors
Recent
2000 character limit reached

Hierarchical Bayesian Extension

Updated 18 November 2025
  • Hierarchical Bayesian extension is a framework that generalizes classical Bayesian models by incorporating multilevel dependency structures to model correlated proportions.
  • It employs reduced parametrization and copula-based approximations to efficiently capture first and second order statistics in high-dimensional, overdispersed or underdispersed settings.
  • The approach enables construction of multivariate credible regions and extends to the Generalised Score Distribution, facilitating practical inference with finite-support discrete data.

A hierarchical Bayesian extension generalizes classical Bayesian models to incorporate hierarchical or multilevel dependency structures in the underlying parameters. In the context of discrete support and binomial-type data, such extensions enable the modeling of correlated proportions, non-exchangeable groupings, and flexible mean-variance relations. Key frameworks include multivariate and hierarchical beta-binomial models, as well as their mean-variance and copula-based parametrizations. Recent work extends these constructions to both the overdispersed and underdispersed regimes and leverages reduced parametrization and copula theory to enable computation in high dimensions.

1. Foundations: Classical and Multivariate Beta-Binomial Frameworks

The starting point for hierarchical Bayesian modeling of proportions is the beta-binomial model. For nn binary trials, the probability of success ϑ\vartheta is treated as a latent variable with a beta prior:

ϑBeta(α,β),XiϑBernoulli(ϑ)\vartheta \sim \mathrm{Beta}(\alpha, \beta), \qquad X_i \mid \vartheta \sim \mathrm{Bernoulli}(\vartheta)

The marginal likelihood and posterior are given by:

P(Y=yn,α,β)=(ny)B(α+y,β+ny)B(α,β)P(Y = y \mid n, \alpha, \beta) = \binom{n}{y} \frac{B(\alpha + y, \beta + n - y)}{B(\alpha,\beta)}

ϑY=yBeta(α+y,β+ny)\vartheta \mid Y = y \sim \mathrm{Beta}(\alpha + y, \beta + n - y)

This model captures overdispersion relative to the binomial, as the variance always exceeds nμ(1μ)n\mu(1-\mu) unless the precision parameter diverges (Westphal, 2019, Ćmiel et al., 2022, Cheraghchi, 2017).

The multivariate extension maps the joint distribution of mm Bernoulli variables to a marginal space via a linear transformation from a 2m2^m-dimensional Dirichlet:

pDirichlet(γ1,,γ2m),ϑ=Hp\mathbf{p} \sim \mathrm{Dirichlet}(\gamma_1, \dots, \gamma_{2^m}), \quad \bm{\vartheta} = H \mathbf{p}

Here, HH encodes the mapping from latent categories to Bernoulli marginals, inducing an mm-dimensional marginal distribution over (0,1)m(0,1)^m (Westphal, 2019).

2. Hierarchical Construction and Reduced Parametrization

As the full Dirichlet representation is computationally infeasible for large mm, hierarchical Bayesian extensions favor reduced parametrizations by encoding the structure through first and second moments. Specifically, the model can be described by the total mass ν=kγk\nu = \sum_k \gamma_k and the matrix A=Hdiag(γ)HTA = H\,\mathrm{diag}(\gamma) H^\mathsf{T} encoding first and second order sufficient statistics:

  • Mean: E[ϑj]=αj/νE[\vartheta_j] = \alpha_j / \nu
  • Covariance: Cov(ϑj,ϑj)=νAjjαjαjν2(ν+1)\mathrm{Cov}(\vartheta_j, \vartheta_{j'}) = \frac{\nu A_{jj'} - \alpha_j \alpha_{j'}}{\nu^2(\nu+1)}

The posterior under multinomial sampling and a Dirichlet prior, using observed cell counts d\mathbf{d}, has straightforward updates:

ν=ν+n,A=A+Hdiag(d)HT\nu^* = \nu + n, \qquad A^* = A + H\,\mathrm{diag}(\mathbf{d})\,H^\mathsf{T}

This approach allows joint shrinkage estimation of the mean vector and covariance structure without the full 2m2^m parameterization (Westphal, 2019).

3. Copula and Gaussian Approximations

Exact joint posteriors for ϑ\bm{\vartheta} are generally unavailable in closed form. Hierarchical models address this using the copula approximation, retaining exact marginal Beta posteriors and capturing dependencies through the posterior correlation matrix RR^*:

f~(θ1,,θm)=cR(F1(θ1),,Fm(θm))j=1mfj(θj)\tilde f(\theta_1, \ldots, \theta_m) = c_{R^*}(F_1(\theta_1), \ldots, F_m(\theta_m)) \prod_{j=1}^m f_j(\theta_j)

Here, FjF_j and fjf_j are the Beta CDF/density for the jjth margin, and cRc_{R^*} is the Gaussian copula density. For computational efficiency in very high dimensions, a normal approximation Nm(μ,Σ)N_m(\mu^*, \Sigma^*) may be used, but boundary-respecting copula regions are typically preferred, especially for small nn or parameters near the unit cube boundary (Westphal, 2019).

4. Multivariate Credible Regions and Coverage Properties

Hierarchical Bayesian extensions facilitate the construction of multivariate credible regions for simultaneous inference. Approaches include:

  • Full model (via Dirichlet sampling): empirical quantiles of ϑ(r)\bm{\vartheta}^{(r)} over Dirichlet samples
  • Copula-based: hyperrectangular regions from Beta marginal quantiles and copula dependence via the threshold cαc_\alpha solving Pr(maxjZjcα)=1α\Pr(\max_j |Z_j| \le c_\alpha) = 1-\alpha
  • Normal approximation: rectangles based on the mean and covariance under the normal approximation

Empirical results indicate the copula and full model approaches yield Bayes coverage close to target levels, outperforming normal approximations in small-sample or boundary scenarios (Westphal, 2019).

5. Extensions Beyond Overdispersion: The Generalised Score Distribution

Classical Bayesian hierarchical models using the beta-binomial component are limited to overdispersed regimes relative to the binomial. The Generalised Score Distribution (GSD) extends this by providing a two-parameter family over finite support {0,,m}\{0, \ldots, m\} with:

μ=E[U][0,m],δ[0,1]\mu = E[U] \in [0, m], \qquad \delta \in [0,1]

For fixed μ\mu, δ\delta interpolates the variance from the minimal (attained by a two-point distribution) to the maximal (attained by extremes), covering the entire feasible variance interval [Vmin(μ),Vmax(μ)][V_{\min}(\mu), V_{\max}(\mu)]. For δC(μ)\delta \leq C(\mu), the GSD matches a reparametrized beta-binomial; for δ>C(μ)\delta > C(\mu), it smoothly continues into the underdispersed regime, unattainable by classical beta-binomial models. Estimation is feasible via method of moments or maximum likelihood (Ćmiel et al., 2022).

6. Entropy and Analytic Properties

The entropy of hierarchical Bayesian extensions with beta-binomial marginals and their generalizations can be expressed both as series and integrals. For the beta-binomial law XBetaBin(n,α,β)X \sim \mathrm{BetaBin}(n,\alpha, \beta), the entropy is:

H(X)=ln(nB(α,β))+j=2n(nj)(1)j[(α)j(α+β)j(c(j)cα(j))+(β)j(α+β)j(c(j)cβ(j))]H(X) = -\ln(n B(\alpha,\beta)) + \sum_{j=2}^{n} \binom{n}{j}(-1)^j \left[ \frac{(\alpha)_j}{(\alpha+\beta)_j} (c(j)-c_\alpha(j)) + \frac{(\beta)_j}{(\alpha+\beta)_j}(c(j)-c_\beta(j)) \right]

where cα(j)c_\alpha(j) are “difference-coefficient” sequences related to Riemann and Hurwitz ζ\zeta functions. An equivalent integral representation is available involving the hypergeometric function and Lerch transcendent (Cheraghchi, 2017). These analytic expressions facilitate precise computation of information-theoretic quantities in hierarchical models.

7. Bernoulli-Sum Interpretation and Practical Inference

Every GSD, including both extended beta-binomial and its underdispersed continuation, admits a representation as a sum of mm dichotomous random variables—potentially dependent—generalizing the de Finetti (Beta-mixing) representation. For δC(μ)\delta \leq C(\mu), the classical beta-binomial structure reappears, while for δ>C(μ)\delta > C(\mu), the law transitions to mixtures incorporating underdispersed two-point and binomial components. Practical inference employs moment matching, maximum likelihood, and boundary-respecting copula regions to handle high dimensionality and support computational tractability (Ćmiel et al., 2022, Westphal, 2019).


Summary Table: Hierarchical Bayesian Extensions for Discrete Data

Feature Standard Beta-Binomial Multivariate/Hierarchical Bayesian Extension Generalised Score Distribution (GSD)
Dispersion Control Overdispersion Overdispersion, multivariate dependence Over- and underdispersion
Parametrization (α,β)(\alpha,\beta) or (μ,ϕ)(\mu,\phi) Dirichlet (2m2^m) or reduced (ν,A\nu,A) (μ,δ)(\mu, \delta)
Computational Feasibility High Moderate (low mm); reduced and copula for high mm High
Entropy, Info-Theoretic Props Series/integral formula Copula and Dirichlet-based Admits Bernoulli-sum representation
Simultaneous Inference Marginal Multivariate credible regions (copula, normal, full) General mean-variance coverage

Hierarchical Bayesian extensions thus provide a rigorous basis for the joint modeling of correlated proportions and support the construction of credible regions, flexible mean-variance interpolation, and tractable computation in high dimensions. Recent developments, including the GSD, address both overdispersed and underdispersed settings, expand the representational range, and offer robust inference tools for practical applications involving finite-support discrete data (Westphal, 2019, Ćmiel et al., 2022, Cheraghchi, 2017).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Hierarchical Bayesian Extension.