Hierarchical Bayesian eBExVar Model

Updated 16 December 2025

eBExVar is a hierarchical Bayesian model that extends regression frameworks by incorporating flexible variable selection mechanisms like spike-and-slab priors.
It uses a mean-field variational inference procedure with closed-form updates to efficiently optimize the evidence lower bound for complex grouped data.
The model supports adaptive shrinkage at both group and global levels, enabling principled inference and model selection in multi-level structures.

A hierarchical Bayesian model (eBExVar) is an extension of classical Bayesian hierarchical linear regression frameworks that incorporates flexible variable selection mechanisms, such as spike-and-slab or inclusion indicators, and can be extended to more complex hierarchical structures and nonconjugate likelihoods. This class of models enables structured regularization and principled inference in the presence of multi-level, grouped data, allowing adaptive shrinkage and selection of relevant predictors at group and global levels (Becker, 2018).

1. Hierarchical Bayesian Linear Regression: Model Structure

The foundational model considers data collected for $C$ groups, each with $M$ observations. For each group $i$ , the measurements consist of covariates $x_{i,m} \in \mathbb{R}^D$ and responses $y_{i,m} \in \mathbb{R}$ . The generative process is defined as follows:

Observation model: $y_{i,m} \sim \mathcal{N}(x_{i,m}^\top \beta_i, \sigma^{-1})$
Group coefficients: $\beta_i \sim \mathcal{N}(\Delta, s^{-1}I_D)$
Global mean: $\Delta \sim \mathcal{N}(0, W^{-1})$ , $W = \text{diag}(w_1, ..., w_D)$
Precisions:
- $\sigma \sim \text{Gamma}(a_0, b_0)$ (noise precision)
- $s \sim \text{Gamma}(c_0, d_0)$ (group coefficients prior precision)
- $w_d \sim \text{Gamma}(e_0, f_0)$ , $d=1...D$ (ARD precision for each coefficient)

The joint density has the factorization:

$p(y, \beta, \Delta, s, \sigma, w) = \Bigg[\prod_{i,m} \mathcal{N}(y_{i,m} | x_{i,m}^\top \beta_i, \sigma^{-1})\Bigg] \Bigg[\prod_i \mathcal{N}(\beta_i | \Delta, s^{-1} I)\Bigg] \mathcal{N}(\Delta | 0, W^{-1}) \cdot \mathrm{Gamma}(\sigma) \mathrm{Gamma}(s) \prod_d \mathrm{Gamma}(w_d)$

This structure allows the model to borrow strength across groups while adapting to group-specific heterogeneity.

2. Variational Inference and Mean-Field Factorization

Due to the computational expense of Markov Chain Monte Carlo in large hierarchical models, a mean-field variational inference approximation is employed. The variational family factorizes as:

$q(\beta, \Delta, \sigma, s, w) = \left[\prod_{i=1}^C q(\beta_i)\right]q(\Delta)q(\sigma)q(s)\prod_{d=1}^D q(w_d)$

The approximate posterior is optimized by maximizing the evidence lower bound (ELBO):

$\mathcal{L}(q) = \mathbb{E}_q[\ln p(y, \beta, \Delta, \sigma, s, w)] - \mathbb{E}_q[\ln q(\beta, \Delta, \sigma, s, w)]$

with explicit decomposition into expected log-joint probabilities and entropies for each factor. All $q$ -factors adopt conjugate forms (Normal or Gamma), and closed-form coordinate ascent updates are available:

$q(\beta_i) = \mathcal{N}(\mu_{\beta_i}, \Sigma_{\beta_i})$
$q(\Delta) = \mathcal{N}(\mu_\Delta, \Sigma_\Delta)$
$q(\sigma) = \mathrm{Gamma}(a_n, b_n)$
$q(s) = \mathrm{Gamma}(c_n, d_n)$
$q(w_d) = \mathrm{Gamma}(e_n, f_{n,d})$

Explicit update formulas for all variational parameters are derived, exploiting expectations under current $q$ distributions.

3. Computational Complexity and Algorithmic Structure

The full coordinate ascent variational inference algorithm iteratively updates each $q$ -factor given the current estimates of expectations from the others, following:

Initialize all variational parameters
repeat
  for i=1…C do
    compute E[σ], E[s], E[Δ]
    update Σ_{β_i}, μ_{β_i}
  end for
  compute E[β_i], Σ_{β_i}
  update Σ_Δ, μ_Δ
  update a_n, b_n for q(σ)
  update c_n, d_n for q(s)
  for d=1…D do
    update e_n, f_{n,d} for q(w_d)
  end for
  optionally compute ELBO
until ELBO converged

The complexity per sweep is dominated by updates of $\beta_i$ , requiring $O(M D^2)$ for each group if inputs are dense, for a total $O(CMD^2 + D^3)$ (with $D^3$ from inverting the shared covariance of $\Delta$ ). This is efficient for moderate $D$ or when sufficient statistics are pre-computed.

4. eBExVar Extensions: Variable Selection and Deeper Hierarchies

The eBExVar model generalizes the above by introducing additional structured flexibility:

Spike-and-slab or Bernoulli inclusion indicators $\gamma_d \in \{0,1\}$ : For each predictor, $\gamma_d \sim \mathrm{Bernoulli}(\pi)$ , and $\beta_{i,d} \sim \gamma_d \mathcal{N}(0,\tau^{-1}) + (1-\gamma_d)\delta_0$ , supporting variable selection at the model level.
Extended mean-field: A new variational factor $q(\gamma_d)$ is incorporated, with closed-form Bernoulli log-odds updates when conjugate.
Deeper hierarchical structure: Additional levels are introduced (e.g., subject $\rightarrow$ item $\rightarrow$ context), each with appropriate Gaussian and Gamma priors, leading to multi-level exchangeable modeling.
Nonconjugate extensions: For models with, e.g., logistic link functions, either local variational bounds or black-box variational inference with Monte Carlo gradients replace closed-form updates.

Adaptations to the inference algorithm include additional coordinate updates for $\gamma_d$ and possible reparameterization of weight precisions (spike versus slab).

5. Flexible Modeling and Practical Considerations

The hierarchical Bayesian model (eBExVar extension) accommodates:

Group-level and global shrinkage: Adaptive sharing of information via $\Delta$ and ARD (automatic relevance determination) on $w_d$ .
Principled model selection: Via inclusion indicators $\gamma_d$ with Beta–Bernoulli hyperpriors, variable selection is integrated with estimation.
Scalability: The variational approximation and closed-form updates permit efficient inference at scale, in contrast to standard MCMC methods.
Generalizability: The framework can be extended for multi-level, nested, or crossed grouping structures with minimal adjustment to the variational inference pipeline (Becker, 2018).

A plausible implication is that this class of models provides a general, modular, and computationally efficient framework for hierarchical variable selection and regularized estimation in high-dimensional grouped data scenarios.

6. Connections and Further Extensions

The eBExVar model is positioned at the intersection of Bayesian variable selection, hierarchical modeling, and scalable variational inference. Possible directions for further development include:

Replacement of ARD-Gamma $w_d$ priors by mixture-of-Gamma and point-masses for spike-and-slab structures.
Extension to non-conjugate likelihoods (e.g., for classification tasks) via black-box variational inference.
Inclusion of multi-way random effects and crossed random structures through additional hierarchical plates.
Monitoring the ELBO throughout optimization as a convergence criterion ensures that the variational posterior approaches local optima.

This suite of models inherits the interpretability and flexibility of Bayesian hierarchical regression while supporting scalable, exact-analytic variational inference and integrative model selection (Becker, 2018).

PDF Markdown Chat (Pro)

References (1)

Variational Bayesian hierarchical regression for data analysis (2018)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Hierarchical Bayesian Model (eBExVar).