Semiparametric Bernstein–von Mises Theorems

Updated 5 February 2026

The paper establishes that under appropriate priors and regularity, the marginal posterior for a finite-dimensional parameter is asymptotically Gaussian, generalizing the classical BvM theorem.
It employs techniques such as uniform LAN expansions, posterior contraction, and Laplace approximations to achieve precise error bounds even in ill-posed or irregular models.
Applications span partially linear regression, nonparametric regression, inverse problems, and mixture models, emphasizing enhanced Bayesian efficiency and uncertainty quantification.

A semiparametric Bernstein–von Mises (BvM) theorem describes the asymptotic normality of the marginal posterior for a finite-dimensional parameter in a model with infinite-dimensional nuisance, generalizing the classical parametric BvM. The semiparametric setting is essential in modern applications, such as partially linear regression, nonparametric regression, inverse problems, diffusion models, and mixture models. Recent advances have also derived semiparametric BvM theorems for projection-based procedures, highly ill-posed inverse problems, finite-sample settings, and in irregular “exponential-type” models.

1. Classical Setup and General Statement

Let $X_1,\dots,X_n \sim P_{\theta,\eta}$ , where $\theta\in\Theta\subset\mathbb R^k$ is finite-dimensional and of primary interest, and $\eta$ is an infinite-dimensional nuisance parameter. A semiparametric BvM theorem asserts that, under appropriate priors and regularity, the marginal posterior for $\theta$ is asymptotically Gaussian with mean and covariance matching the frequentist efficient estimator for $\theta$ : $\Pi_n\left(\sqrt{n}(\theta-\theta_0)\in B\mid X_1,\dots,X_n\right) \to N_{\,\Delta_n,\,I_0^{-1}}(B)$ in total variation, where $I_0$ is the efficient information and $\Delta_n$ is any efficient estimator (e.g., the posterior mean) [(Bickel et al., 2010, Kleijn, 2013), a, (Chae, 2015, Chae et al., 2016)].

The standard proof route is via:

Posterior contraction for both $\theta$ and $\eta$ ;
Uniform LAN (local asymptotic normality) expansion for the likelihood (possibly after reparametrization to least-favorable submodels or efficient directions);
Marginalization and Laplace approximations to derive Gaussian posteriors for the low-dimensional parameter.

2. Key Assumptions and Methodological Innovations

Essential conditions for a semiparametric BvM theorem include:

LAN (or LAE) and Taylor-type expansions for either the log-likelihood or marginal likelihood in $\theta$ , possibly along least-favorables or adaptive submodels [(Kleijn, 2013), a, (Castillo et al., 2013)].
Posterior contraction for the full parameter ( $\theta,\eta$ ) at appropriate rates (typically $O(n^{-1/2})$ for $\theta$ , possibly slower for $\eta$ ).
Metric entropy or small-ball conditions on the nuisance parameter [a].
Prior “thickness” at the truth for $\theta$ , and sufficient KL-mass around $\eta_0$ .
Change-of-measure invariance, or that the prior is flat (or shift-invariant) under local shifts along the efficient score direction (Kleijn, 2013, Walker, 2023).
No-bias/orthogonality conditions between the parameter of interest and nuisance directions.

Multiple works showed that strong (often impractical) conditions on prior invariance can be relaxed by reparametrization (e.g., from $(\theta,f)$ to ( $\theta,m$ ) in the partially linear model, which allows independent priors and avoids bias (Walker, 2023)). Further, Kleijn’s “approximate least-favorable submodel” approach allows for bypassing the classical requirement of explicit least-favorable submodels (Kleijn, 2013).

3. Proof Strategies and Influence Function Calculation

The proof schemes universally exploit expansions of the integrated (marginal) likelihood and/or functionals of interest:

Marginal Likelihood Expansion: Uniform expansions in $h = \sqrt{n}(\theta-\theta_0)$ ,

$\log s_n(\theta_0 + n^{-1/2}h) - \log s_n(\theta_0) = h^\top \Delta_{n,\theta_0,\eta_0} - \frac{1}{2} h^\top I_0 h + o_{P_0}(1)$

where $I_0$ is the semiparametric efficient information [a, (Kleijn, 2013, Franssen et al., 2024)].

Functional Expansion: For smooth functionals $\psi(\eta)$ , a Fréchet expansion

$\psi(\eta) = \psi(\eta_0) + \langle \phi_{\eta_0},\,\eta-\eta_0 \rangle + r(\eta,\eta_0)$

where $\phi_{\eta_0}$ is the efficient influence function (Castillo et al., 2013, L'Huillier et al., 2023).

Laplace- or Fourier-type arguments then show that the marginal posterior (for $\theta$ or $\psi(\eta)$ ) collapses onto the Normal law with semiparametric information matching the frequentist lower bound [a, (Chae, 2015, Giordano et al., 2018)].

Critically, the calculation of the efficient influence function is model-specific. In partially linear models, the influence for $\theta$ subtracts the projection onto the tangent space of $f$ [a]; in inverse problems, it involves operator-theoretic projections against nuisance directions (Magra et al., 2023, Giordano et al., 2018); for functionals, it is the Riesz representer in the Hilbert space tangent space (Castillo et al., 2013, L'Huillier et al., 2023, Giordano et al., 22 May 2025).

4. Variants and Recent Progress

a. Projection-based and Shape-constrained Procedures

In "Semiparametric Bernstein-von Mises Phenomenon via Isotonized Posterior in Wicksell’s problem" (Gili et al., 21 Feb 2025), the BvM is proved for an isotonized posterior (IIP) based on a Dirichlet process prior on the observable distribution, followed by an $L^2$ -projection onto the space of monotonicity constraints. The limit variance involves the smoothness parameter $\gamma$ (from Hölder continuity) and reflects the mildly ill-posed nature of the underlying inverse problem: $\delta_n^{-1}(\widehat V_G(x)-\widehat V_n(x)) \mid Z_1^n \rightsquigarrow N\left(0,\, \frac{g_0(x)}{2\gamma}\right)$ There, inference at the minimax rate for boundary recovery is achieved, and credible intervals automatically reflect frequentist coverage.

b. Inverse Problems and Diffusions

Recent theorems address inverse problems, in both Hilbertian white-noise models (Giordano et al., 2018, Magra et al., 2023), parabolic PDEs (Magra et al., 31 Jan 2026), and SDE-based ergodic diffusions (Giordano et al., 22 May 2025). The general structure involves contraction to a shrinking tube, LAN expansion in the appropriate directions, and operator-theoretic identification of efficient information: $\hat\Psi_T = \Psi(B_0) + \frac{1}{T} \int_0^T \nabla \varphi_L(X_t) \cdot dW_t + o_{P_0}(T^{-1/2})$ with the normalized posterior for $\Psi(B)$ converging to $N(0, I^{-1}_{\rm eff})$ (Giordano et al., 22 May 2025).

c. Semiparametric Mixtures

In mixture models with latent structure, e.g., frailty models and errors-in-variables, a semiparametric BvM holds for the finite-dimensional parameter $\theta$ with species sampling priors on the mixing distribution, if suitable LAN and posterior consistency are shown for the mixture model (Franssen et al., 2024).

d. Irregular (LAE-type) Models

Irregular problems (such as support boundary or change-point estimation) are out of the classical (LAN) class. Here, a Bernstein–von Mises theorem yields exponential, not normal, posterior limits: $\Pi\left(n(\theta-\theta_0) \in A\mid X^{(n)}\right) \to \mathrm{Exp}_-(\lambda_0)$ with $\lambda_0$ the jump size; importantly, Bayesian point estimators attain minimax risk, while MLEs are inefficient (Kleijn et al., 2012, Kleijn, 2013).

e. Second-Order and Finite-Sample Theory

Second-order theory addresses the proximity of the finite-sample marginal posterior of $\theta$ to the normal limit, showing that the frequentist accuracy of Bayesian inference for $\theta$ is affected by the nonparametric contraction rate for the nuisance $\eta$ and possibly by semiparametric bias. Using carefully constructed dependent priors, adaptation and second-order efficiency can be achieved (Yang et al., 2015). Finite-sample theorems establish explicit error bounds of order $p^3/n$ for the marginal posterior, leading to the notion of a critical dimension (Panov et al., 2013).

5. Implementation, Algorithms, and Applications

Semiparametric BvM theorems have enabled:

Efficient uncertainty quantification in Bayesian nonparametric regression (including BART (Rockova, 2019), Gaussian process regression, and wavelet-based approaches (Castillo et al., 2013, Giordano et al., 2018));
Shape-constrained estimation in stereological models (Wicksell’s problem) using isotonized inverse posteriors (Gili et al., 21 Feb 2025);
Adaptive estimation in partially linear or regression models under minimal symmetry or smoothness conditions (Chae, 2015, Chae et al., 2016, Walker, 2023);
Efficient computation via conjugacy (e.g., Dirichlet Process mixtures in symmetric error density models); straightforward Gibbs sampling in partially linear models using the “Robinson transform” parametrization (Walker, 2023).

In all cases, the theorem justifies matching of Bayesian credible intervals to frequentist confidence sets asymptotically, under regularity or self-similarity (§BART), shape-constraints, and prior support conditions.

6. Extensions, Open Problems, and Future Directions

Modern work has targeted:

Nonlinear and low-regularity functionals (necessitating higher-order expansions to control semiparametric bias (Castillo et al., 2013));
Models with hierarchical or adaptive priors, and the interaction between nuisance contraction and primary parameter efficiency (Yang et al., 2015);
Models where the “prior shift–invariance” is hard to enforce, motivating reparametrization or dependent priors (Walker, 2023);
Extension to fractional posteriors and other decision-theoretic settings (L'Huillier et al., 2023);
Finite-sample accuracy and critical-dimension phenomena (Panov et al., 2013).

Remaining frontiers include full adaptation under minimal prior smoothness, handling highly ill-posed inverse problems, and irregular semiparametric problems beyond current LAE/LAN dichotomies.

7. Summary Table: Key Advances in Semiparametric BvM Theory

Reference	Model Class	Prior Type / Innovation	Posterior Limit	Functional	Limit Law
(Bickel et al., 2010)	General semiparametric	Product prior, LAN, metric entropy	Asymp Norm TV	parametric	Gaussian
[a], (Kleijn, 2013)	Partial linear, location mixture, monotone	Gaus/Dirichlet, approximate least-fav.	Asymp Norm TV / Exp	param/func	N / Exp
(Chae, 2015)	Symmetric error regression, random effects	Symmetrized DP mixture, Gibbs	Total Variation	parametric	Gaussian
(Magra et al., 31 Jan 2026)	Heat equation, PDE	GP on absorption, operator theory	Total Variation	parametric	Gaussian
(Gili et al., 21 Feb 2025)	Wicksell inverse, shape-constrained	DP on observables, L2-projection	Centered and Scaled	functional	Gaussian
(Giordano et al., 2018)	Linear inverse, white noise	GP prior, Tikhonov regularization	Functional, TV	Linear/NL	Gaussian
(Giordano et al., 22 May 2025)	Reversible diffusion	GP/Besov-Laplace, SDEs	Functional, TV	Nonlinear	Gaussian
(Franssen et al., 2024)	Semiparametric mixtures	DP/MFM mixture, least-favorable	Bounded-Lipschitz	parametric	Gaussian
(Kleijn et al., 2012, Kleijn, 2013)	Irregular LAE	Product prior, location/scaling jumps	TV	parametric	Exponential
(Yang et al., 2015)	Second order: PLR, Cox, GPLM	Dependent/independent priors	Rate + bias	parametric	Gaussian + o()
(Panov et al., 2013)	Finite-sample, critical dimension	Product prior, Gaussian expansion	explicit error TV	parametric	Gaussian
(Rockova, 2019)	BART, nonparametric regression	Tree-based/histogram prior, adaptivity	Weak/TV, functionals	linear	Gaussian

This synthesis captures the scope, conditions, methodology, and depth of modern semiparametric Bernstein–von Mises theory. The results provide a rigorous, model-specific foundation for Bayesian efficiency and probabilistic uncertainty quantification in models with structured infinite-dimensional nuisance.