Wasserstein-Cramér-Rao Theory

Updated 15 November 2025

Wasserstein-Cramér-Rao theory is a framework that recasts the limits of unbiased estimation by using sensitivity, defined via the 2-Wasserstein Riemannian metric, instead of variance.
It establishes a new Cramér-Rao-type bound—the WCR bound—by leveraging the Wasserstein information matrix and matrix Cauchy–Schwarz inequality on estimator gradients.
The theory identifies conditions under which transport families and e-geodesics yield sensitivity-efficient estimators, with applications in robust estimation under additive noise.

Wasserstein-Cramér-Rao theory recasts fundamental limits of unbiased statistical estimation by replacing the variance—traditionally analyzed via the Fisher-Rao geometry—with a new notion of sensitivity defined through the 2-Wasserstein Riemannian structure. Sensitivity quantifies the instability of an estimator under infinitesimal additive perturbations rather than resampling variability, leading to an alternative Cramér-Rao-type bound known as the Wasserstein-Cramér-Rao (WCR) lower bound. This framework enables rigorous characterization and attainment criteria for estimators optimized for sensitivity, identifies analogues of exponential families called transport families or e-geodesics, and clarifies the roles of Wasserstein projection estimators and their asymptotic properties.

1. Classical and Wasserstein Geometries in Estimation Theory

In classical parametric estimation, the uncertainty of an unbiased estimator $T_n$ for parameter $\chi(\theta)$ is captured by its variance

$\operatorname{Var}_\theta(T_n) = \mathbb{E}_\theta[(T_n - \chi(\theta))(T_n - \chi(\theta))^\top].$

Under differentiability-in-quadratic-mean (DQM), the score function $G_\theta(x)$ and Fisher information matrix $I(\theta)$ form the basis of the Cramér-Rao bound:

$\operatorname{Cov}_\theta(T_n) \succeq \frac{1}{n} D\chi(\theta)^\top I(\theta)^{-1} D\chi(\theta).$

The underlying geometry relies on the Fisher-Rao (equivalently, Hellinger) Riemannian metric on the space of probability laws.

Wasserstein-Cramér-Rao theory pivots to the geometry induced by the 2-Wasserstein metric over $P_2(\mathbb{R}^d)$ , the space of measures with finite second moment. The 2-Wasserstein distance,

$W_2^2(\mu, \nu) = \inf_{\pi \in \Pi(\mu, \nu)} \int \|x - y\|^2 d\pi(x, y),$

admits a formal Riemannian structure, and infinitesimal perturbations of the data are analyzed via transport maps and associated linearizations.

2. Sensitivity, Dirichlet Energy, and Wasserstein Information

Sensitivity of an estimator $T_n$ to additive noise is defined by,

$\operatorname{Sen}_{\theta,\varepsilon}(T_n) = \mathbb{E}_\theta \left| \frac{T_n(X') - T_n(X)}{\varepsilon} \right|^2,$

where $X'_i = X_i + \varepsilon \xi_i$ , $\xi_i \sim \mathcal{N}(0, I_d)$ . Formal expansion as $\varepsilon \to 0$ gives the limiting Dirichlet energy,

$\operatorname{Sen}_\theta(T_n) := \mathbb{E}_\theta \left[ \sum_{i=1}^n \|\nabla_{x_i} T_n(X)\|^2 \right].$

The Wasserstein transport linearization for smoothly varying models $P_\theta$ is

$\Phi_\theta(x) = \left. \frac{\partial}{\partial t} \left[ t_{P_\theta \rightarrow P_{\theta + t}}(x) - x \right] \right|_{t=0},$

with the Wasserstein information matrix

$J(\theta) = \mathbb{E}_\theta [ \Phi_\theta(X)^\top \Phi_\theta(X) ].$

This matrix plays an analogous role to Fisher information, quantifying the response in Wasserstein geometry to infinitesimal parameter changes.

3. The Wasserstein-Cramér-Rao Inequality

A key result establishes for $T_n$ unbiased for $\chi(\theta)$ that

$\operatorname{Cos}_\theta(T_n) \equiv \mathbb{E}_\theta\left[ \sum_{i=1}^n \nabla_{x_i} T_n(x)^\top \nabla_{x_i} T_n(x) \right] \succeq \frac{1}{n} D\chi(\theta)^\top J(\theta)^{-1} D\chi(\theta),$

and, in scalar parameter cases,

$\operatorname{Sen}_\theta(T_n) \geq \frac{(\chi'(\theta))^2}{n J(\theta)}.$

This is the Wasserstein–Cramér–Rao lower bound, which governs the fundamental limit of estimator sensitivity under infinitesimal additive perturbation noise (Trillos et al., 10 Nov 2025, Nishimori et al., 15 Jun 2025). The result follows via formal Riemannian geometry reasoning and a matrix Cauchy–Schwarz inequality applied to gradients of the estimator and the Wasserstein score.

4. Exact Efficiency: Transport Families and E-geodesics

A model $P_\theta$ is termed a transport family if there exists a potential $\phi: \mathbb{R}^d \rightarrow \mathbb{R}^k$ and parameterization $\chi: \Theta \rightarrow \mathbb{R}^k$ such that

$\Phi_\theta(x) = D\phi(x) [\Lambda(\theta)]^{-1} D\chi(\theta)^\top,$

where $\Lambda(\theta) = \mathbb{E}_\theta [ D\phi(X)^\top D\phi(X) ]$ is invertible.

Sensitivity-efficient estimators $T_n$ are characterized by attaining the WCR bound with equality, i.e.,

$\operatorname{Cos}_\theta(T_n) = \frac{1}{n} D\chi(\theta)^\top J(\theta)^{-1} D\chi(\theta),$

for all $\theta$ . In transport families, the estimator

$T_n(X_1,\ldots,X_n) = \frac{1}{n} \sum_{i=1}^n \phi(X_i)$

is unbiased for $\chi(\theta)$ (up to a constant) and is sensitivity-efficient.

In one-parameter models, e-geodesics (in the sense of the Otto metric) further specify that the Wasserstein score does not depend on $\theta$ after monotone reparameterization. Existence of exact Wasserstein-efficient estimators is deeply connected to the geometry of transport families and e-geodesics, unlike the broader scope of exponential families in Fisher-Rao theory (Nishimori et al., 15 Jun 2025).

5. Asymptotic Efficiency and the Wasserstein Projection Estimator

Given $X_1, \ldots, X_n \sim P_{\theta^*}$ , the Wasserstein projection estimator (WPE) is defined as

$\hat{\theta}_n = \mathrm{arg\,min}_{\theta \in \Theta} W_2^2\left(P_\theta, \bar{P}_n \right),$

with $\bar{P}_n = \frac{1}{n} \sum_{i=1}^n \delta_{X_i}$ , the empirical measure.

Definition: WPE $\hat{\theta}_n$ is asymptotically sensitivity-efficient if

$n \sum_{i=1}^n \|\nabla_{x_i} \hat{\theta}_n(X)\|^2 \to J(\theta^*)^{-1}$

in $P_{\theta^*}$ -probability.

In univariate settings ( $d=1$ ) with smooth quantile map $u \mapsto F_\theta^{-1}(u)$ ,

$n \sum (\partial \hat{\theta}_n/\partial x_i)^2 \to J(\theta^*)^{-1},$

and, with bounded support and positive density,

$\sqrt{n}(\hat{\theta}_n - \theta^*) \to \mathcal{N}(0, \Sigma(\theta^*)),$

where $\Sigma(\theta^*)$ equals the asymptotic sensitivity bound. For multidimensional cases, analogous asymptotic results require stronger regularity (envelope-theorem-style arguments), but the theory generalizes in principle (Trillos et al., 10 Nov 2025).

6. Examples: Gaussian, Uniform, Laplace, Regression, and Pareto Families

A selection of models illustrates where variance- and sensitivity-efficiency coalesce or diverge.

Gaussian location ( $d=1$ ):

$P_\theta = N(\theta,1)$ . Here, $\Phi_\theta(x) = 1$ , $J(\theta) = 1$ . Both classical and Wasserstein bounds are $1/n$, and the sample mean $\bar{X}$ precisely attains both.

Uniform scale on $[0,\theta]$ :

$P_\theta = \operatorname{Unif}[0,\theta]$ , $t_{P_{\theta_0}\to P_{\theta_1}}(x) = x \theta_1 / \theta_0$ , $\Phi_\theta(x) = x/\theta$ , $J(\theta) = 1/3$ . The only unbiased $\chi$ admitting exact sensitivity-efficiency is $\chi(\theta) = \frac{1}{2}\theta^2$ ; the plug-in estimator $T_n = \sum X_i^2 / (2n)$ attains the WCR bound. The delta-method yields an asymptotically optimal estimator for $\theta$ itself: $\sqrt{n}((\sum X_i^2 / n)^{1/2} - \theta)$ .

Laplace location:

The sample median has $O(1)$ sensitivity (constant order), but the sample mean has $1/n$ sensitivity, making the mean sensitivity-efficient though not variance-efficient.

Linear regression (fixed design):

Ordinary least squares (OLS) is sensitivity-efficient.

Pareto families:

Specific L-statistics achieve exact W-efficiency for certain parameterizations.

A summary table:

Model	Sensitivity-efficient Estimator	Sensitivity Bound Value
Gaussian (location)	Sample Mean $\bar{X}$	$1/n$
Uniform Scale $[0,\theta]$	$T_n = \sum X_i^2 / (2n)$ (for $\chi=\frac{1}{2}\theta^2$ )	$3/n$ (for $J=1/3$ , scalar case)
Laplace (location)	Sample Mean	$O(1/n)$
Linear Regression	OLS	Computed via Wasserstein info matrix

7. Context, Limitations, and Relation to Broader Theory

Wasserstein-Cramér-Rao theory is motivated by the need to analyze estimator instability outside the scope of resampling variability. Sensitivity—Dirichlet energy of the estimator—captures the reaction to infinitesimal additive noise and aligns with important practical settings: measurement error models, local differential privacy (noise injection), and distributionally robust optimization in Wasserstein ambiguity sets.

The sensitivity-bound parallels classical theory: while the Fisher-Rao geometry yields variance-based Cramér-Rao limits, the Otto/Wasserstein geometry yields sensitivity limits—sometimes revealing new optimality properties (e.g., new L-statistics) or showing that classical estimators (mean) retain optimality, but not invariably (MLE in non-Gaussian location models may fail sensitivity-optimality).

A plausible implication is that transport geometry may admit further generalization to nonparametric or finite-sample settings, or to alternative instability measures under different metrics (total variation, Sobolev). However, explicit sensitivity-efficient estimators in multivariate and curved/complex families remain an open area, with location-scale families and products of e-geodesics forming the main tractable cases explored thus far. Extensions to higher-order asymptotics, transport-entropic divergences, and generalized transport-exponential structures are identified as promising research directions (Trillos et al., 10 Nov 2025, Nishimori et al., 15 Jun 2025).

8. Comparison with Fisher Information and Exponential Families

Classically, exponential families admit exact attainment of the Cramér-Rao bound for variance, with a wide array of tractable models. In Wasserstein theory, transport families and e-geodesics play a similar role—though the geometric restrictions imposed by optimal transport severely reduce the generality of cases admitting exact sensitivity-efficient estimators. This suggests a sharper dichotomy between variance and sensitivity as robustness criteria, contributing to the broader agenda of statistical inference under geometric and transport-theoretic principles.

Further context is provided by contemporary work (Ay 2024) on Otto connections and the e-geodesics, which deepens the information-geometric perspective for transport metrics and suggests new directions for the structure and analysis of families attaining optimal robustness against additive perturbations.

PDF Markdown Chat (Pro)

References (2)

Wasserstein-Cramér-Rao Theory of Unbiased Estimation (2025)

On the attainment of the Wasserstein--Cramer--Rao lower bound (2025)

Follow Topic

Get notified by email when new papers are published related to Wasserstein-Cramér-Rao Theory.