Hierarchical Variational Family
- Hierarchical variational family is a class of approximating distributions that decouple global dependencies and local flexibility in hierarchical Bayesian models.
- It employs a Gaussian copula to model multivariate dependencies while using Bernstein polynomial transformations for flexible, nonparametric marginal approximations.
- The approach achieves higher ELBO values and improved uncertainty quantification compared to mean-field and fixed-form methods, making it scalable for complex inference tasks.
A hierarchical variational family is a class of approximating distributions used in variational inference that is specifically designed to capture the complex dependency structures and non-Gaussian marginal shapes commonly arising in hierarchical Bayesian models. Rather than relying on restrictive assumptions such as mean-field factorization or fixed-form Gaussian approximations, hierarchical variational families explicitly decouple global dependence and local flexibility, often leveraging copula constructions, auxiliary variable hierarchies, or nonparametric marginal transformations. This approach enables accurately modeling correlated, skewed, heavy-tailed, or multimodal posterior distributions, particularly in settings where latent variables exhibit pronounced interactions or the posterior geometry is highly non-standard.
1. Posterior Decoupling via Copula Construction
The hierarchical variational family presented by Variational Gaussian Copula (VGC) inference (Han et al., 2015) starts from the observation that any multivariate distribution can be decomposed, according to Sklar's theorem, into a copula that captures global dependencies and a collection of marginal densities: Here, the true (but intractable) marginal CDFs are paired with an (unknown) copula encapsulating the full dependency structure. VGC posits a variational density mirroring this form: In the VGC variational family, the copula is chosen as a parametric Gaussian copula, and the marginal densities are allowed to be highly flexible and nonparametric in nature.
2. Gaussian Copula for Modeling Dependency Structure
The use of a Gaussian copula is motivated by its ability to preserve multivariate dependencies regardless of the marginal distributions’ specific form. The transformation is realized by mapping through its marginal to , followed by the inverse standard normal to obtain . The Gaussian copula density is then written: where is the correlation matrix. This structure leaves univariate marginals free (non-Gaussian) while enforcing consistent modeling of dependency.
3. Flexible Marginals via Bernstein Polynomial Transformations
VGC achieves flexibility in univariate marginals through a semiparametric bijective transformation employing “sandwich” Bernstein polynomials and monotonic transfer functions. The transformation is defined as: where is a Bernstein polynomial
and denotes the regularized incomplete beta function. maps to the support of . This nonparametric transformation allows each marginal , expressed as
to match a wide variety of posterior shapes including skewed, heavy-tailed, or multimodal forms.
4. Variational Objective and Optimization
With the variational family
the KL divergence decomposes additively due to Sklar’s representation: The evidence lower bound (ELBO) becomes: Parametric and nonparametric parameters (copula correlations, mean, covariance, Bernstein weights) are optimized jointly using reparameterization gradients:
Monte Carlo estimates yield unbiased gradients. Explicitly, derivatives such as
are computed via samples from the transformed Gaussian, propagating gradients through both copula and marginal transformations.
5. Comparison with Mean-Field and Fixed-Form Approximations
Classical mean-field variational Bayes (MFVB) assumes
which corresponds to an independence copula and forfeits the ability to model posterior dependencies, leading to underestimation of posterior variances. In variational Gaussian (VG) approximations,
dependencies are accounted for, but all marginals are forced to be Gaussian, losing flexibility. The VGC approach retains joint dependency (via the copula) and marginal flexibility (via the nonparametric transformation), allowing simultaneous modeling of intricate dependencies and complex marginal posterior behaviors, such as nonnormality and multimodality, a feature not accessible in either MFVB or VG.
6. Empirical and Computational Properties
The major empirical finding is that VGC achieves higher ELBO values and more accurate posterior approximations in hierarchical models compared to both MFVB and VG. The application of reparameterization and stochastic optimization makes the approach automated and computationally efficient in practice. The method is especially advantageous when conditional conjugacy is lacking or when the marginal posteriors are notably complex.
A summary of comparative properties:
Method | Joint Dependencies | Marginal Flexibility | Computational Cost |
---|---|---|---|
Mean-Field | No | Parametric (fixed form) | Low |
Variational Gaussian | Yes | Gaussian | Moderate |
VGC (Copula + Poly) | Yes | Nonparametric (flexible) | Moderate/High (but scalable) |
7. Applications and Generality
The hierarchical variational family embodied by VGC is particularly suitable for complex Bayesian models—especially hierarchical models with non-conjugate, non-standard full conditionals and highly structured dependencies among latent variables. The copula-based construction enables accurate uncertainty quantification for, e.g., general mixed effect models, factor analyzers, and nonparametric models where both dependency and flexible marginal adaptation are vital. The automatic and scalable inference property supports its application to large-scale, high-dimensional Bayesian inference tasks where mean-field variational inference fails to provide reliable posterior characterizations.
In conclusion, hierarchical variational families, as instantiated by the variational Gaussian copula methodology, provide an expressive and tractable approach for variational inference in hierarchical Bayesian models—successfully capturing both the joint dependency structure and flexible non-Gaussian marginal posteriors, and offering a principled mechanism for improved inference accuracy and posterior coverage in the presence of challenging posterior geometries (Han et al., 2015).