Posterior Contraction Rates

Updated 25 May 2026

Posterior contraction rates are defined as the rate at which Bayesian posteriors concentrate around the true parameter using metrics like L2, Hellinger, or Wasserstein.
They serve as a critical benchmark in Bayesian nonparametric theory, validating inference approaches in high- and infinite-dimensional settings.
Establishing these rates involves verifying entropy and prior-mass conditions, constructing sieves, and leveraging dynamic methods such as the Benamou–Brenier formulation.

Posterior contraction rates quantify the asymptotic speed at which the Bayesian posterior distribution concentrates around the true data-generating parameter or function as sample size increases. They are central to Bayesian nonparametric theory, providing a frequentist benchmarking of Bayesian learning and establishing certainty quantification for Bayesian inference in both parametric and high- or infinite-dimensional models. The mathematical machinery for posterior contraction rates ("PCRs"—Editor's term) has evolved to include information- and transportation-based distances, particularly the Wasserstein metrics, and adapts to dominated, non-dominated, parametric, nonparametric, linear, nonlinear, and even computationally discrete settings.

1. Formal Definition and Conceptual Framework

Given a statistical model $\{P_\theta : \theta \in \Theta\}$ with parameter space $\Theta$ (which may be infinite-dimensional), observations $X^{(n)} = (X_1, ..., X_n)$ , a prior $\Pi$ on $\Theta$ , and sample size $n$ , the posterior $\Pi_n(\cdot|X^{(n)})$ quantifies the conditional belief on $\theta$ after observing data. A sequence $\varepsilon_n \to 0$ is a posterior contraction rate at $\theta_0$ if, for every $\Theta$ 0,

$\Theta$ 1

where $\Theta$ 2 is an appropriate metric, such as $\Theta$ 3, Hellinger, or Wasserstein (Dolera et al., 2020, Dolera et al., 2022, Camerlenghi et al., 2022). This quantifies that posterior mass contracts around $\Theta$ 4 at rate $\Theta$ 5.

In nonparametric and high-dimensional models, $\Theta$ 6 is often taken as a norm on function spaces or a probability metric (e.g., Wasserstein- $\Theta$ 7 distance): $\Theta$ 8 with $\Theta$ 9 the set of couplings of $X^{(n)} = (X_1, ..., X_n)$ 0 (Dolera et al., 2022).

2. Posterior Contraction in Dominated and Non-dominated Models

Dominated models: Classical PCRs exploit Bayes' formula; the posterior can be written as

$X^{(n)} = (X_1, ..., X_n)$ 1

where $X^{(n)} = (X_1, ..., X_n)$ 2 is a density with respect to a dominating measure (Dolera et al., 2020).

Non-dominated models: In many Bayesian nonparametric constructions (e.g., Dirichlet process mixtures, normalized random measures), no single dominating measure $X^{(n)} = (X_1, ..., X_n)$ 3 exists. In this case, one works with posterior kernels $X^{(n)} = (X_1, ..., X_n)$ 4 arising from disintegration (de Finetti decomposition), and PCRs are defined using, for example, the $X^{(n)} = (X_1, ..., X_n)$ 5 metric on $X^{(n)} = (X_1, ..., X_n)$ 6 (Camerlenghi et al., 2022). The formal PCR is then:

$X^{(n)} = (X_1, ..., X_n)$ 7

which quantifies the expected $X^{(n)} = (X_1, ..., X_n)$ 8-Wasserstein distance of the random posterior to the Dirac mass at $X^{(n)} = (X_1, ..., X_n)$ 9.

3. Methodologies for Establishing Posterior Contraction Rates

The general theoretical strategy for establishing PCRs involves verifying a combination of:

Entropy (complexity) conditions: Covering numbers or metric entropy of the effective parameter space at scale $\Pi$ 0, e.g. $\Pi$ 1 (Gao et al., 2015, Finocchio et al., 2021, Oh et al., 12 May 2026).
Prior-mass (small ball) conditions: Sufficient prior mass near the truth, i.e. $\Pi$ 2 for Kullback–Leibler neighborhood (Gao et al., 2015, Camerlenghi et al., 2022).
Testing and concentration inequalities: Existence of exponentially powerful tests for testing $\Pi$ 3 against alternatives at distance $\Pi$ 4 (Reiss et al., 2017, Fan et al., 25 Jan 2026, Dolera et al., 2024).
Sieve construction: In nonparametrics or under weak regularity, contractive "sieves" (compact/entropy-controlled sets) inside which entropy and testing conditions hold, with negligible prior mass outside (Camerlenghi et al., 2022).

Recent developments leverage the Wasserstein metric and the dynamic Benamou–Brenier formulation (Dolera et al., 2020, Dolera et al., 2022), providing two novelties:

Avoidance of sieves in certain contexts — PCRs can be established in strong metrics via the dynamic formulation (Dolera et al., 2020, Dolera et al., 2022, Dolera et al., 2024).
Direct connection to empirical process rates and Poincaré inequalities: PCRs are linked to rates of convergence in the empirical measure (Glivenko–Cantelli), Sanov's large deviation principle in Wasserstein distance, and the estimation of weighted Poincaré–Wirtinger constants (Dolera et al., 2022).

4. Quantitative Examples in Parametric, Nonparametric, and Infinite-dimensional Models

Regular parametric models

For regular finite-dimensional models, the posterior contracts at the optimal parametric rate: $\Pi$ 5 for any prior with positive, continuous density at $\Pi$ 6 (Dolera et al., 2022).

Nonparametric Dirichlet-Laplace mixtures

For the model $\Pi$ 7 with Laplace mixing and a Dirichlet process prior, Gao and van der Vaart (Gao et al., 2015) show: $\Pi$ 8 for the mixing distribution, and

$\Pi$ 9

for the density, matching minimax lower bounds up to log factors.

Deep Gaussian process priors and compositional classes

Under deep GP priors for functions expressed as compositions $\Theta$ 0 with layerwise Hölder or Besov regularity, the contraction rate in $\Theta$ 1 is

$\Theta$ 2

achieving minimax adaptivity to unknown compositional structure (Finocchio et al., 2021).

Besov-Laplace priors in white noise

For functions $\Theta$ 3 ( $\Theta$ 4) and smoothness-matching Besov-Laplace priors, the strong posterior contraction rate in the Sobolev norm is

$\Theta$ 5

matching minimax lower bounds (Dolera et al., 2024).

High-dimensional and nonparametric sparsity

For regression with spike-and-slab or shrinkage priors, the rate is

$\Theta$ 6

where $\Theta$ 7 is true sparsity, and $\Theta$ 8 (Zhang et al., 2019, Naveau et al., 2024).

Non-dominated nonparametric models

For Dirichlet process ({\small DP}) or normalized Gamma process priors in non-dominated settings (i.e., not absolutely continuous in the parameter), the contraction rate in $\Theta$ 9 is (Camerlenghi et al., 2022): $n$ 0 where $n$ 1 is the expected Wasserstein convergence of the empirical measure, and the second term depends on the metric entropy and prior concentration.

5. Analytical Structure: Wasserstein Dynamics, Glivenko–Cantelli, and Poincaré Constants

A central methodological innovation is the combination of:

Local Lipschitz-continuity of the posterior: If $n$ 2 for sufficient statistics or empirical measures $n$ 3, then PCRs can be controlled by data fluctuation rates (Dolera et al., 2020, Dolera et al., 2022).
Dynamic Benamou–Brenier formulation of $n$ 4: The infimum over transport plans is reframed as an infimum over absolutely continuous probability curve flows $n$ 5 solving the continuity equation with minimal kinetic energy (Dolera et al., 2022).
Laplace method and weighted Poincaré–Wirtinger constants: Asymptotics in both finite and infinite dimension rely on Laplace expansions and spectral gap estimates for