Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

89 tokens/sec

Gemini 2.5 Pro Premium

41 tokens/sec

GPT-5 Medium

23 tokens/sec

GPT-5 High Premium

19 tokens/sec

GPT-4o

96 tokens/sec

DeepSeek R1 via Azure Premium

88 tokens/sec

GPT OSS 120B via Groq Premium

467 tokens/sec

Kimi K2 via Groq Premium

197 tokens/sec

2000 character limit reached

Variational Bernstein–von Mises Theorem

Updated 5 August 2025

The Variational Bernstein–von Mises theorem is a formal result that establishes Gaussian approximations for variational Bayes posteriors in complex, high-dimensional parametric and latent variable models.
It leverages a local quadratic (LAN) expansion of the variational log-likelihood to derive finite-sample error bounds and ensure consistency and asymptotic normality as the parameter dimension grows.
The theorem underpins practical uncertainty quantification in models like Gaussian mixtures by rigorously connecting computational VB methods with classical frequentist coverage guarantees.

The Variational Bernstein--von Mises (VB-BvM) theorem formalizes the asymptotic behavior and frequentist validity of variational Bayes (VB) approximations to the Bayesian posterior in parametric and latent variable models, especially in high-dimensional and nonconjugate settings. Recent advances make explicit, finite-sample quantitative connections between the local quadratic form of the variational log-likelihood and Gaussian approximations to the VB posterior, yielding both consistency and asymptotic normality with increasing parameter dimension. Finite-sample control further enables practical performance guarantees and sharp error characterization for variational uncertainty quantification.

1. Theoretical Foundations of the VB-BvM Theorem

The VB-BvM theorem is derived in a non-asymptotic regime for a class of regular parametric models with latent variables. A core device is a local quadratic approximation ("LAN expansion") of the empirical variational log-likelihood $M_n(\theta; x)$ . In a localized parameter set $\Theta_0(r_0)$ around the target value,

$M_n(\theta; x) = M_n(\theta^*; x) + (\theta - \theta^*)^\top \nabla M_n(\theta^*; x) - \tfrac{1}{2} \|D_0(\theta - \theta^*)\|^2 + R,$

where $\theta^*$ is the "true" parameter, $D_0$ is a scaling matrix related to the local Fisher information $V_{\theta_0}$ by $D_0^2 = n V_{\theta_0}$ , and $R$ is a remainder controlled by a local $\Delta(r_0, y)$ term (see Eq. (LAN expansion) in the cited work).

This expansion enables construction of a "VB ideal posterior" of the form

$\pi_{\mathrm{VB}}^{*}(\theta) \propto \exp\left\{ M_n(\theta; x) \right\} p(\theta).$

The quadratic approximation in high-probability local sets ensures that this posterior is close in total variation to the normal density $\mathcal{N}(\theta^{*} + V_{\theta_0}^{-1} \Delta_n, (n V_{\theta_0})^{-1})$ , with $\Delta_n$ a scaled score-like term. Crucially, the theorem provides explicit error bounds controlling the accuracy of this Gaussian approximation in terms of dimension $p$ , sample size $n$ , and the localization radius, rather than relying on standard asymptotics.

Representative quantitative bounds include, with $p$ possibly increasing with $n$ ,

$|\log \int p(\theta) \exp\{M_n(\theta; x)\} d\theta + (1/2) \log\det(V_{\theta_0}) + (p/2) \log n - M_n(\theta^*; x) - \log p(\theta^*) - (p/2) \log(2\pi) - (1/2)\Delta_n^\top V_{\theta_0}\Delta_n| \leq \text{Error}(r_0, n, p).$

2. Consistency and Asymptotic Normality of the VB Posterior

Given the local quadratic expansion and mild identifiability and moment assumptions, the following two properties hold for the VB estimator $\hat\theta_{VB}$ (or the mean/location parameter of the variational family):

Consistency: In high-dimensional scaling, $\|\hat\theta_{VB} - \theta^*\| = O_p(\sqrt{p/n})$ , i.e., the estimator converges to the "truth" at parametric rate with an explicit dependence on $p$ .
Asymptotic Normality: For any fixed direction $\alpha$ , the distribution of the inference error is asymptotically normal:

$\sqrt{n} \alpha^\top (\hat\theta_{VB} - \theta^*) / \sigma_\alpha \to_d N(0,1),$

with $\sigma_\alpha^2 = \alpha^\top V_{\theta_0}^{-1} \mathrm{Var}(\nabla m(\theta^*; x)) V_{\theta_0}^{-1} \alpha$ .

The analysis follows from Pinsker-type inequalities and tight control over the difference between the actual variational minimum and the quadratic minimizer, showing the VB posterior is close in total variation to a normal law with mean and covariance as above.

3. Application to Latent Variable Models:

Multivariate Gaussian Mixture Models

For multivariate Gaussian mixture models (GMMs), the variational log-likelihood coincides (modulo permutation) with the observed data log-likelihood after optimization over local responsibilities: $m(\mu; x_i) = -\log K - (p/2)\log(2\pi) + \log \left\{ \sum_{k} \exp\left(-\tfrac{1}{2}\|x_i - \mu_k\|^2\right) \right\}.$ The VB-BvM theorem applies after accounting for label switching, with convergence of the variational posterior (or its symmetrized version) to the normal, even as the number of components and parameter dimension grow with the sample size.

In GMMs, the local quadratic expansion becomes exact due to the explicit combinatorial structure, and the principal technical device becomes controlling the behavior of the variational solution over the parameter space, using symmetry and moment bracketing.

4. High-Dimensional Regimes: Explicit Non-Asymptotic Control

A significant feature is that both $n$ and $p$ (or $K$ , the number of mixture components) can increase. The expansion is carried out for $\theta \in \Theta_0(r_0)$ with $h = \sqrt{n}(\theta - \theta^*)$ , and error terms are given explicitly in $p$ and $n$ (e.g., $p^{3/2}/\sqrt{n}$ ). Theoretical results demonstrate that provided $p^3 = o(n)$ (or similar constraints), the VB posterior remains well-characterized by its Gaussian approximation. Exponential moment, bracketing, and tail control arguments ensure the "edge" mass outside the localized parameter set is negligible. This is critical for modern applications where $p$ can be of the same order as, or larger than, $n$ .

5. Comparison to Classical and Recent Theory

Whereas most existing theory for variational Bayes focuses on fixed-dimension settings or conjugate exponential-family models, the VB-BvM theorem here draws on empirical process theory and non-asymptotic local quadratic approximation for general parametric and latent variable models. Notable features include:

Finite-sample error bounds featuring explicit dependence on $p$ and $n$ .
Accommodation of label-switching and identifiability via permutation-invariant analysis.
Use of bracketing and VC (Vapnik–Chervonenkis) complexity arguments for uniform control in high dimensions.
Explicit demonstration that the VB posterior's uncertainty quantification is valid in the same sense as for the full Bayesian posterior: credible sets have correct frequentist coverage under the appropriate scaling.

A plausible implication is that this finite-sample, dimension-explicit control is necessary for principled application of VB approximations in large-scale modern inference problems.

6. Implications and Scope

The VB-BvM theorem rigorously establishes that under regularity and localization, the variational posterior behaves as a Gaussian measure with mean near the true parameter and covariance prescribed by the variational curvature, even in increasing dimension regimes. Consistency and asymptotic normality hold with explicit rates. The approach applies to a wide class of models, including but not limited to GMMs, supporting the validity of computationally efficient VB methods for uncertainty quantification.

By providing sharp non-asymptotic error bounds and demonstrating robustness to high-dimensional scaling, the theory also enables new diagnostics for assessing the practical accuracy of VB in modern latent variable and mixture models, bridging computational tractability and statistical validity. It unifies the statistical understanding of VB with classical Bernstein-von Mises phenomena, giving practitioners and theorists a precise tool to analyze uncertainty statements arising from variational Bayesian procedures in complex settings.

PDF Markdown Chat (Upgrade)