Wasserstein Projection Estimator

Updated 11 November 2025

Wasserstein projection estimator is a statistical method that projects empirical measures onto a model manifold by minimizing the quadratic Wasserstein (W2) distance.
It employs optimal transport techniques and Kantorovich duality to achieve sensitivity-efficiency and asymptotic normality, often outperforming classical estimators like the MLE.
Practical implementations range from closed-form solutions for univariate scale models to gradient-based optimization in high-dimensional settings, ensuring robust performance under regularity conditions.

A Wasserstein projection estimator is a statistical procedure that, given empirical data, identifies the closest element of a specified model class or constraint set in the sense of p-Wasserstein distance, particularly the quadratic case $W_2$ . This estimator arises as a canonical analogue of the classical maximum likelihood estimator (MLE) but in the geometric context of optimal transport, yielding efficient or robust estimators under finite samples, and it features prominently in modern statistical theory, computational optimal transport, and learning.

1. Formal Definition

Given a parametric statistical model $P = \{ P_\theta:\theta\in\Theta \} \subset \mathcal{P}_2(\mathbb{R}^d)$ (probability measures with finite second moment), and an empirical measure $\bar P_n = \frac{1}{n} \sum_{i=1}^n \delta_{X_i}$ from i.i.d. data $X_i \sim P_{\theta^*}$ , the Wasserstein projection estimator (WPE) is defined by

$T_n^{\mathrm{WPE}} = \arg\min_{\theta\in\Theta} W_2^2(P_\theta, \bar P_n) = \arg\min_{\theta\in\Theta} \int_{\mathbb{R}^d\times\mathbb{R}^d} \|x - y\|^2 \, d\pi(x,y),$

where $\pi \in \Pi(P_\theta, \bar P_n)$ spans the set of couplings ( $\pi$ with marginals $P_\theta$ , $\bar P_n$ ). For distributions $P_\theta$ absolutely continuous with respect to Lebesgue and with unique optimal transport maps, this reduces to minimizing the mean squared transport displacement: $T_n^{\mathrm{WPE}} = \arg\min_{\theta} \int_{\mathbb{R}^d} \| t_{P_\theta \to \bar P_n}(x) - x \|^2 \, dP_\theta(x),$ where $t_{P_\theta\to \bar P_n}$ is the optimal transport map.

This construction is the projection of the empirical distribution onto the model manifold in the Wasserstein geometry, as opposed to the KL projection (MLE).

2. Sensitivity and the Wasserstein–Cramér–Rao Bound

In classical estimation, the Cramér–Rao framework quantifies the variance of unbiased estimators under independent resampling. The Wasserstein–Cramér–Rao theory, as introduced in (Trillos et al., 10 Nov 2025), instead focuses on sensitivity: the response of the estimator to infinitesimal perturbations of the data (in analogy to influence functions).

Formally, for an estimator $T_n:\mathbb{R}^{dn}\to\mathbb{R}^k$ ,

$\mathrm{Sen}_{P}(T_n) = \mathbb{E}_P \left[ \sum_{i=1}^n \| \nabla_{x_i} T_n(X_1,\dots,X_n) \|^2 \right].$

The central result is a Wasserstein–Cramér–Rao lower bound: $\mathrm{Cos}_\theta(T_n) \succeq \frac{1}{n} (D\chi(\theta))^\top J(\theta)^{-1} D\chi(\theta),$ where $J(\theta)$ is the Wasserstein information matrix (see Definition 3.8, (Trillos et al., 10 Nov 2025)) and $D\chi(\theta)$ is the derivative of the model mean statistic $\chi(\theta)$ .

An estimator is sensitivity-efficient if its cosensitivity achieves this lower bound.

3. Achieving Optimal Sensitivity: Asymptotic Properties

Under Wasserstein differentiability (DWS), identifiability, and regularity of the transport maps,

$n \sum_{i=1}^n (D_{x_i} T_n^{\mathrm{WPE}})^\top D_{x_i} T_n^{\mathrm{WPE}} \xrightarrow{\mathbb{P}} J(\theta^*)^{-1}$

as $n\to\infty$ , making the WPE asymptotically sensitivity-efficient (Theorem 5.7 and Theorem 5.12, (Trillos et al., 10 Nov 2025)). In particular, for univariate models:

Define $G_n(\theta) = W_2^2(P_\theta, \bar P_n) = \int_0^1 |F_\theta^{-1}(u) - \bar F_n^{-1}(u)|^2 du$ .
The estimator solves the first-order condition $\partial_\theta G_n(\hat\theta_n) = 0$ , and differentiation via the implicit function theorem yields explicit formulas for the influence of a data point.

In higher-dimensions, the proof utilizes Kantorovich duality, differentiation of Laguerre cells, and regularity of dual potentials. If regularity holds (bounded support, $C^2$ regularity of $p_\theta$ , uniform convergence of transport derivatives), then the estimator is both consistent and asymptotically normal.

4. Computational Methods

Closed-form solutions arise in special cases:

d=1 (scale families): $T_n^{\mathrm{WPE}} = \arg\min_\theta \int_0^1 | \theta F_1^{-1}(u) - \bar F_n^{-1}(u)|^2 du$ , with explicit solutions for families like Uniform $[0,\theta]$ :

$T_n^{\mathrm{WPE}} = \frac{3}{2n^2} \sum_{i=1}^n (2i-1) X_{(i)}.$

This is an $L$ -statistic in order statistics.

d≥1: Each evaluation of the WPE objective requires solving a semi-discrete Optimal Transport problem between $P_\theta$ (continuous) and $\bar P_n$ (discrete), often via power diagrams or entropic-regularized solvers. The minimization over $\theta$ is handled via gradient-based (or even stochastic) optimizers, using the gradient $\partial_\theta W_2^2$ .

No universal algorithm is provided for the high-dimensional case, but the routine is to:

For each candidate $\theta$ , compute the optimal transport map (via standard OT algorithms).
Compute $W_2^2(P_\theta, \bar P_n)$ and its gradient.
Minimize with respect to $\theta$ via standard optimization.

5. Statistical Examples and Empirical Performance

Canonical model cases:

Gaussian location: $P_\theta = N(\theta, \sigma^2)$ , WPE is the sample mean, attaining the exact sensitivity bound $1/n$.
Gaussian variance: WPE for $(\mathbb{E}[X], \mathbb{E}[X^2])$ yields the unbiased sample variance via the delta method.
Uniform scale: Various estimators can be compared:
- MLE ( $\max X_i$ ): sensitivity $O(1)$ .
- Best linear unbiased estimator (BLE, $2 \cdot$ mean): sensitivity $4/n$.
- WPE ( $L$ -statistic on order statistics): sensitivity $3/n$ (best among unbiased estimators).

Monte Carlo simulations (see Figure 1, (Trillos et al., 10 Nov 2025)) confirm that the WPE exhibits minimal variance and sensitivity constants compared to classical alternatives over a range of models.

6. Exact Versus Asymptotic Efficiency and Model Structure

Exact efficiency (finite-n attainment of the lower bound) occurs if and only if the model is a "transport family": the transport linearization takes the form

$\Phi_\theta(x) = D\phi(x) \Lambda(\theta)^{-1} (D\chi(\theta))^\top,$

and the statistic $\phi(X)$ has mean $\chi(\theta)$ . Families with location, scale, Gaussian first-two moments, some regressions, and Pareto models fall into this category.

For general models, the WPE does not attain the bound at finite $n$ , but is always asymptotically efficient under DWS-type regularity.
Regularity conditions for asymptotic results include: weak continuity and smoothness of the inverse CDFs in 1D, and boundedness, uniform C² density smoothness, and control over dual potential derivatives and Laguerre cell integrals in $d\geq 1$ .

7. Interpretation and Broader Implications

The Wasserstein projection estimator encodes a broader principle: projection in optimal transport geometry yields sensitivity-efficient analogues to classical estimators in the Fisher–Rao setting. This yields estimators with improved robustness properties, especially in non-Gaussian, heavy-tailed, or contaminated data regimes, where sensitivity to small data perturbations is critical. In standard models, WPE matches or improves over maximum likelihood in both asymptotic and finite-sample efficiency.

The estimator also admits further extension to models with additional structure (e.g., circular data, manifold constraints, infinite-support measures) and connects directly to contemporary developments in computational optimal transport and empirical process theory (Trillos et al., 10 Nov 2025).

Table: Sensitivity Constants for Three Estimators in Uniform Scale Model

Estimator	Sensitivity	Attains Bound?
MLE ( $\max X_i$ )	$O(1)$	No
Best Linear Unbiased (BLE)	$4/n$	No
WPE ( $L$ -statistic)	$3/n$	Yes

This highlights that the WPE provides the minimal possible sensitivity among unbiased estimators in appropriate transport families.

In summary, the Wasserstein projection estimator generalizes traditional statistical projection and MLE concepts into the optimal transport framework, providing sensitivity-optimal and robust estimators across a breadth of models, with rigorous efficiency guarantees and practical computational pathways.

PDF Markdown Chat (Pro)

References (1)

Wasserstein-Cramér-Rao Theory of Unbiased Estimation (2025)

Follow Topic

Get notified by email when new papers are published related to Wasserstein Projection Estimator.