Wasserstein Projection Estimator
- Wasserstein projection estimator is a statistical method that projects empirical measures onto a model manifold by minimizing the quadratic Wasserstein (W2) distance.
- It employs optimal transport techniques and Kantorovich duality to achieve sensitivity-efficiency and asymptotic normality, often outperforming classical estimators like the MLE.
- Practical implementations range from closed-form solutions for univariate scale models to gradient-based optimization in high-dimensional settings, ensuring robust performance under regularity conditions.
A Wasserstein projection estimator is a statistical procedure that, given empirical data, identifies the closest element of a specified model class or constraint set in the sense of p-Wasserstein distance, particularly the quadratic case . This estimator arises as a canonical analogue of the classical maximum likelihood estimator (MLE) but in the geometric context of optimal transport, yielding efficient or robust estimators under finite samples, and it features prominently in modern statistical theory, computational optimal transport, and learning.
1. Formal Definition
Given a parametric statistical model (probability measures with finite second moment), and an empirical measure from i.i.d. data , the Wasserstein projection estimator (WPE) is defined by
where spans the set of couplings ( with marginals , ). For distributions absolutely continuous with respect to Lebesgue and with unique optimal transport maps, this reduces to minimizing the mean squared transport displacement: where is the optimal transport map.
This construction is the projection of the empirical distribution onto the model manifold in the Wasserstein geometry, as opposed to the KL projection (MLE).
2. Sensitivity and the Wasserstein–Cramér–Rao Bound
In classical estimation, the Cramér–Rao framework quantifies the variance of unbiased estimators under independent resampling. The Wasserstein–Cramér–Rao theory, as introduced in (Trillos et al., 10 Nov 2025), instead focuses on sensitivity: the response of the estimator to infinitesimal perturbations of the data (in analogy to influence functions).
Formally, for an estimator ,
The central result is a Wasserstein–Cramér–Rao lower bound: where is the Wasserstein information matrix (see Definition 3.8, (Trillos et al., 10 Nov 2025)) and is the derivative of the model mean statistic .
An estimator is sensitivity-efficient if its cosensitivity achieves this lower bound.
3. Achieving Optimal Sensitivity: Asymptotic Properties
Under Wasserstein differentiability (DWS), identifiability, and regularity of the transport maps,
as , making the WPE asymptotically sensitivity-efficient (Theorem 5.7 and Theorem 5.12, (Trillos et al., 10 Nov 2025)). In particular, for univariate models:
- Define .
- The estimator solves the first-order condition , and differentiation via the implicit function theorem yields explicit formulas for the influence of a data point.
In higher-dimensions, the proof utilizes Kantorovich duality, differentiation of Laguerre cells, and regularity of dual potentials. If regularity holds (bounded support, regularity of , uniform convergence of transport derivatives), then the estimator is both consistent and asymptotically normal.
4. Computational Methods
Closed-form solutions arise in special cases:
- d=1 (scale families): , with explicit solutions for families like Uniform:
This is an -statistic in order statistics.
- d≥1: Each evaluation of the WPE objective requires solving a semi-discrete Optimal Transport problem between (continuous) and (discrete), often via power diagrams or entropic-regularized solvers. The minimization over is handled via gradient-based (or even stochastic) optimizers, using the gradient .
No universal algorithm is provided for the high-dimensional case, but the routine is to:
- For each candidate , compute the optimal transport map (via standard OT algorithms).
- Compute and its gradient.
- Minimize with respect to via standard optimization.
5. Statistical Examples and Empirical Performance
Canonical model cases:
- Gaussian location: , WPE is the sample mean, attaining the exact sensitivity bound $1/n$.
- Gaussian variance: WPE for yields the unbiased sample variance via the delta method.
- Uniform scale: Various estimators can be compared:
- MLE (): sensitivity .
- Best linear unbiased estimator (BLE, mean): sensitivity $4/n$.
- WPE (-statistic on order statistics): sensitivity $3/n$ (best among unbiased estimators).
Monte Carlo simulations (see Figure 1, (Trillos et al., 10 Nov 2025)) confirm that the WPE exhibits minimal variance and sensitivity constants compared to classical alternatives over a range of models.
6. Exact Versus Asymptotic Efficiency and Model Structure
- Exact efficiency (finite-n attainment of the lower bound) occurs if and only if the model is a "transport family": the transport linearization takes the form
and the statistic has mean . Families with location, scale, Gaussian first-two moments, some regressions, and Pareto models fall into this category.
- For general models, the WPE does not attain the bound at finite , but is always asymptotically efficient under DWS-type regularity.
- Regularity conditions for asymptotic results include: weak continuity and smoothness of the inverse CDFs in 1D, and boundedness, uniform C² density smoothness, and control over dual potential derivatives and Laguerre cell integrals in .
7. Interpretation and Broader Implications
The Wasserstein projection estimator encodes a broader principle: projection in optimal transport geometry yields sensitivity-efficient analogues to classical estimators in the Fisher–Rao setting. This yields estimators with improved robustness properties, especially in non-Gaussian, heavy-tailed, or contaminated data regimes, where sensitivity to small data perturbations is critical. In standard models, WPE matches or improves over maximum likelihood in both asymptotic and finite-sample efficiency.
The estimator also admits further extension to models with additional structure (e.g., circular data, manifold constraints, infinite-support measures) and connects directly to contemporary developments in computational optimal transport and empirical process theory (Trillos et al., 10 Nov 2025).
Table: Sensitivity Constants for Three Estimators in Uniform Scale Model
| Estimator | Sensitivity | Attains Bound? |
|---|---|---|
| MLE () | No | |
| Best Linear Unbiased (BLE) | $4/n$ | No |
| WPE (-statistic) | $3/n$ | Yes |
This highlights that the WPE provides the minimal possible sensitivity among unbiased estimators in appropriate transport families.
In summary, the Wasserstein projection estimator generalizes traditional statistical projection and MLE concepts into the optimal transport framework, providing sensitivity-optimal and robust estimators across a breadth of models, with rigorous efficiency guarantees and practical computational pathways.