Papers
Topics
Authors
Recent
2000 character limit reached

Bernstein–von Mises Theorem Overview

Updated 16 December 2025
  • Bernstein–von Mises theorem is a foundational result that shows the posterior becomes asymptotically normal and centered on efficient estimators under regularity conditions.
  • It extends to high-dimensional, semiparametric, and nonparametric models, quantifying Gaussian approximation errors and ensuring optimal frequentist coverage.
  • Practical applications include online inference, inverse problems, and functional estimation, with rigorous control using concentration inequalities and Laplace approximations.

The Bernstein–von Mises (BvM) theorem is a foundational result in asymptotic Bayesian theory that establishes the frequentist validity of Bayesian credible sets by proving that, under suitable regularity conditions, the posterior distribution for parameters of interest becomes asymptotically normal and centered at an efficient estimator, with variance matching the optimal frequentist information bound. Originally formulated for regular parametric models, BvM-type results now encompass a wide spectrum of statistical models, including high-dimensional and semiparametric regimes, nonparametric functionals, and inverse problems, as well as modern computational settings such as online and variational Bayes frameworks.

1. Classical Bernstein–von Mises Theorem

In classical finite-dimensional parametric models, let X1,,XnX_1,\ldots,X_n be i.i.d. with density pθp_\theta, θΘRd\theta \in \Theta \subset \mathbb{R}^d. Under regularity conditions (local asymptotic normality, prior positivity, smoothness), the posterior distribution of θ\theta given the data concentrates around the maximum likelihood estimator θ^n\hat{\theta}_n at rate n1/2n^{-1/2} and is asymptotically normal: Π(n(θθ^n)X(n))N(0,I01)TV0,\left\|\, \Pi\bigl(\sqrt n(\theta-\hat\theta_n)\in\cdot\mid X^{(n)}\bigr) - \mathcal N\bigl(0,I_0^{-1}\bigr)\, \right\|_{\mathrm{TV}} \to 0, where I0I_0 is the Fisher information at the true θ0\theta_0 (Chae et al., 2016, Katsevich, 2023, Spokoiny, 2013). Thus, Bayesian credible sets become asymptotically valid frequentist confidence sets and Bayesian point estimators (such as posterior mean or median) are efficient in the sense of attaining the Cramér–Rao bound.

2. High-Dimensional and Nonasymptotic Regimes

In modern settings with growing parameter dimension p=pnp = p_n, the BvM theorem quantifies how the validity of the Gaussian approximation degrades:

  • For regular parametric models, total variation distance between the posterior and the target normal law is O((p3/n)1/2)O\big((p^3/n)^{1/2}\big); thus, p3np^3 \ll n is necessary for the classical BvM phenomenon (Spokoiny, 2013).
  • Improved analyses show that for generalized linear models and multinomial data, nd2n \gg d^2 suffices, with explicit nonasymptotic bounds on the total variation error in terms of Hessian Lipschitz constants and third derivatives of the log-likelihood (Katsevich, 2023).
  • These results rely on careful control of Laplace approximation errors and explicit use of concentration inequalities for high-dimensional central limit problems.
Setting Required growth condition Reference
Parametric MLE/BvM p3np^3 \ll n (Spokoiny, 2013)
GLM/multinomial d2nd^2 \ll n (Katsevich, 2023)

3. Semiparametric Bernstein–von Mises Theorems

Semiparametric BvM theorems address models with both finite-dimensional “parameters of interest” θ\theta and infinite-dimensional nuisance parameters η\eta. Under local asymptotic normality (LAN) in the least-favorable submodel, sufficient prior mass on KL neighborhoods, and metric entropy bounds on the nuisance parameter set, these results show the marginal posterior for θ\theta is asymptotically Gaussian: supBRpΠ(n(θθ0)BX1:n)N(Δn,I01)(B)0\sup_{B \subset \mathbb{R}^p}\left| \Pi\left( \sqrt{n}(\theta-\theta_0) \in B \mid X_{1:n} \right) - N(\Delta_n, I_0^{-1})(B) \right| \to 0 where

Δn=1ni=1nI01g0(Xi)\Delta_n = \frac{1}{\sqrt{n}} \sum_{i=1}^n I_0^{-1} g_0(X_i)

and g0g_0 is the efficient score (Bickel et al., 2010, Collaboration et al., 2015, Chae, 2015, Franssen et al., 29 Nov 2024). Here, efficiency and frequentist validity extend to Bayesian estimation procedures even in the presence of infinite-dimensional nuisance components.

Example: Semiparametric Mixtures

For mixture models pθ,F(x)=pθ(xz)dF(z)p_{\theta,F}(x) = \int p_\theta(x \mid z) \, dF(z), with Dirichlet process or species sampling prior for FF, a semiparametric BvM theorem holds for the parameter θ\theta provided the LAN condition is established along least-favorable submodels FtF_t and the prior is sufficiently reparameterization-invariant. This guarantees that posterior inference for θ\theta is efficient and yields asymptotically correct frequentist coverage, e.g. in frailty and errors-in-variables models (Franssen et al., 29 Nov 2024).

4. Nonparametric and Functional BvM Theorems

Nonparametric BvM results address infinite-dimensional parameters, often under Gaussian process priors, for function estimation and linear inverse problems. These results focus on:

  • Weak convergence of the centered and scaled posterior in appropriate Banach spaces (e.g., multiscale, dual, or Sobolev spaces) (Castillo et al., 2013, Giordano et al., 2018, Rømer, 17 Sep 2024, Nickl, 2017).
  • Asymptotically valid credible sets or “credible bands” for functionals or the entire function parameter, with diameters matching minimax rates.
  • Semiparametric BvM for linear functionals, yielding normal posteriors with the efficient semiparametric variance (Giordano et al., 2018).

A notable result is that, under Gaussian or sieve priors, the posterior for linear functionals L(f)L(f) of a function ff has the asymptotic form: L(f)YN(L(fˉ),ε2AψG2)L(f) | Y \approx N\big(L(\bar{f}), \varepsilon^2 \|A\psi\|_G^2\big) where fˉ\bar{f} is the posterior mean and AA is a model-dependent operator (Giordano et al., 2018).

5. Applications and Generalizations

High-dimensional and Online Inference

  • An “online” BvM theorem has been established for sequential Bayesian updating with variational Gaussian approximations. The recursion, under appropriate batch size mp2m \gg p^2, ensures the final posterior is close in total variation to the full posterior, with error O(p3/2/n1/2)O(p^{3/2}/n^{1/2}) (Lee et al., 8 Apr 2025).
  • For high-dimensional settings, the precise impact of third derivatives and log-concavity on Laplace approximation accuracy has been quantified, leading to deterministic bounds for the error of Gaussian approximation in total variation and Kullback–Leibler divergence (Dehaene, 2019).

Covariance Matrix and Functional Inference

  • BvM theorems for functionals of covariance matrices (entries, log-determinant, eigenvalues) in high dimensions, including explicit scaling of error in terms of ranks and dimension (Gao et al., 2014).
  • Nonparametric BvM for Dirichlet process priors: convergence of Laplace transforms of the posterior measure to those of a Brownian bridge process, uniform over function classes with bounded variation, showing the posterior law for functionals is (weakly) asymptotically normal (Ray et al., 2020).

Inverse and Ill-posed Problems

  • For statistical inverse problems, both parametric and semiparametric BvM theorems have been established, including for nonlinear, high-dimensional settings. The scaling involves the forward map’s ill-posedness, with an explicit growth rate n[ill-posedness factor]2d3/2logdn \gg [\text{ill-posedness factor}]^2 \cdot d^{3/2} \log d needed for normality (Lu, 2017, Bohr, 2022, Magra et al., 2023).
  • For linear inverse problems, one obtains semiparametric BvM for functionals—posterior mean as an efficient estimator and credible sets as valid frequentist confidence sets (Giordano et al., 2018, Magra et al., 2023, Nickl, 2017).

6. Technical Underpinnings and Proof Strategies

All BvM-type results hinge on verifying the following:

  • Local asymptotic normality (LAN) in appropriate directions (parametric, semiparametric, or functional).
  • Sufficient posterior concentration around the true parameter/subspace at appropriate rates.
  • Change-of-measure or no-bias conditions for the prior, ensuring invariance of the prior under local shifts aligned with the efficient score direction (Bickel et al., 2010, Franssen et al., 29 Nov 2024).
  • Control of remainder terms (third derivatives/Taylor expansion) often using entropy or concentration inequalities, and localization of the posterior onto high-probability high-mass sets (Castillo et al., 2013, Giordano et al., 2018, Lu, 2017).

7. Limitations, Extensions, and Open Problems

Some notable issues and frontiers:

  • In semiparametric mixture settings, the existence of explicit least-favorable submodels and their compatibility with the prior is nontrivial; the structure may fail in complex mixtures (Franssen et al., 29 Nov 2024).
  • Infinite-dimensional or nonparametric BvM results typically hold in weak topologies (bounded-Lipschitz), not in strong topologies like total variation, due to the singular nature of the limiting process (Castillo et al., 2013, Ray et al., 2020).
  • Extension to settings with dependent data, model misspecification, or non-log-concave posteriors is nontrivial and remains a focus of ongoing research (Spokoiny, 2013, Dehaene, 2019).
  • For joint (parameter, nuisance) posteriors and credible sets in the infinite-dimensional nuisance direction, joint BvM theorems typically fail or require strong additional regularization (Franssen et al., 29 Nov 2024, Rømer, 17 Sep 2024).

References:

This synthesis represents the core principles, major results, and research landscape for the Bernstein–von Mises theorem and its extensions as rigorously established in the cited arXiv literature.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Bernstein-von Mises Theorem.