Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 186 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 65 tok/s Pro
Kimi K2 229 tok/s Pro
GPT OSS 120B 441 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Wasserstein Projection Estimator

Updated 11 November 2025
  • Wasserstein projection estimator is a statistical method that projects empirical measures onto a model manifold by minimizing the quadratic Wasserstein (W2) distance.
  • It employs optimal transport techniques and Kantorovich duality to achieve sensitivity-efficiency and asymptotic normality, often outperforming classical estimators like the MLE.
  • Practical implementations range from closed-form solutions for univariate scale models to gradient-based optimization in high-dimensional settings, ensuring robust performance under regularity conditions.

A Wasserstein projection estimator is a statistical procedure that, given empirical data, identifies the closest element of a specified model class or constraint set in the sense of p-Wasserstein distance, particularly the quadratic case W2W_2. This estimator arises as a canonical analogue of the classical maximum likelihood estimator (MLE) but in the geometric context of optimal transport, yielding efficient or robust estimators under finite samples, and it features prominently in modern statistical theory, computational optimal transport, and learning.

1. Formal Definition

Given a parametric statistical model P={Pθ:θΘ}P2(Rd)P = \{ P_\theta:\theta\in\Theta \} \subset \mathcal{P}_2(\mathbb{R}^d) (probability measures with finite second moment), and an empirical measure Pˉn=1ni=1nδXi\bar P_n = \frac{1}{n} \sum_{i=1}^n \delta_{X_i} from i.i.d. data XiPθX_i \sim P_{\theta^*}, the Wasserstein projection estimator (WPE) is defined by

TnWPE=argminθΘW22(Pθ,Pˉn)=argminθΘRd×Rdxy2dπ(x,y),T_n^{\mathrm{WPE}} = \arg\min_{\theta\in\Theta} W_2^2(P_\theta, \bar P_n) = \arg\min_{\theta\in\Theta} \int_{\mathbb{R}^d\times\mathbb{R}^d} \|x - y\|^2 \, d\pi(x,y),

where πΠ(Pθ,Pˉn)\pi \in \Pi(P_\theta, \bar P_n) spans the set of couplings (π\pi with marginals PθP_\theta, Pˉn\bar P_n). For distributions PθP_\theta absolutely continuous with respect to Lebesgue and with unique optimal transport maps, this reduces to minimizing the mean squared transport displacement: TnWPE=argminθRdtPθPˉn(x)x2dPθ(x),T_n^{\mathrm{WPE}} = \arg\min_{\theta} \int_{\mathbb{R}^d} \| t_{P_\theta \to \bar P_n}(x) - x \|^2 \, dP_\theta(x), where tPθPˉnt_{P_\theta\to \bar P_n} is the optimal transport map.

This construction is the projection of the empirical distribution onto the model manifold in the Wasserstein geometry, as opposed to the KL projection (MLE).

2. Sensitivity and the Wasserstein–Cramér–Rao Bound

In classical estimation, the Cramér–Rao framework quantifies the variance of unbiased estimators under independent resampling. The Wasserstein–Cramér–Rao theory, as introduced in (Trillos et al., 10 Nov 2025), instead focuses on sensitivity: the response of the estimator to infinitesimal perturbations of the data (in analogy to influence functions).

Formally, for an estimator Tn:RdnRkT_n:\mathbb{R}^{dn}\to\mathbb{R}^k,

SenP(Tn)=EP[i=1nxiTn(X1,,Xn)2].\mathrm{Sen}_{P}(T_n) = \mathbb{E}_P \left[ \sum_{i=1}^n \| \nabla_{x_i} T_n(X_1,\dots,X_n) \|^2 \right].

The central result is a Wasserstein–Cramér–Rao lower bound: Cosθ(Tn)1n(Dχ(θ))J(θ)1Dχ(θ),\mathrm{Cos}_\theta(T_n) \succeq \frac{1}{n} (D\chi(\theta))^\top J(\theta)^{-1} D\chi(\theta), where J(θ)J(\theta) is the Wasserstein information matrix (see Definition 3.8, (Trillos et al., 10 Nov 2025)) and Dχ(θ)D\chi(\theta) is the derivative of the model mean statistic χ(θ)\chi(\theta).

An estimator is sensitivity-efficient if its cosensitivity achieves this lower bound.

3. Achieving Optimal Sensitivity: Asymptotic Properties

Under Wasserstein differentiability (DWS), identifiability, and regularity of the transport maps,

ni=1n(DxiTnWPE)DxiTnWPEPJ(θ)1n \sum_{i=1}^n (D_{x_i} T_n^{\mathrm{WPE}})^\top D_{x_i} T_n^{\mathrm{WPE}} \xrightarrow{\mathbb{P}} J(\theta^*)^{-1}

as nn\to\infty, making the WPE asymptotically sensitivity-efficient (Theorem 5.7 and Theorem 5.12, (Trillos et al., 10 Nov 2025)). In particular, for univariate models:

  • Define Gn(θ)=W22(Pθ,Pˉn)=01Fθ1(u)Fˉn1(u)2duG_n(\theta) = W_2^2(P_\theta, \bar P_n) = \int_0^1 |F_\theta^{-1}(u) - \bar F_n^{-1}(u)|^2 du.
  • The estimator solves the first-order condition θGn(θ^n)=0\partial_\theta G_n(\hat\theta_n) = 0, and differentiation via the implicit function theorem yields explicit formulas for the influence of a data point.

In higher-dimensions, the proof utilizes Kantorovich duality, differentiation of Laguerre cells, and regularity of dual potentials. If regularity holds (bounded support, C2C^2 regularity of pθp_\theta, uniform convergence of transport derivatives), then the estimator is both consistent and asymptotically normal.

4. Computational Methods

Closed-form solutions arise in special cases:

  • d=1 (scale families): TnWPE=argminθ01θF11(u)Fˉn1(u)2duT_n^{\mathrm{WPE}} = \arg\min_\theta \int_0^1 | \theta F_1^{-1}(u) - \bar F_n^{-1}(u)|^2 du, with explicit solutions for families like Uniform[0,θ][0,\theta]:

TnWPE=32n2i=1n(2i1)X(i).T_n^{\mathrm{WPE}} = \frac{3}{2n^2} \sum_{i=1}^n (2i-1) X_{(i)}.

This is an LL-statistic in order statistics.

  • d≥1: Each evaluation of the WPE objective requires solving a semi-discrete Optimal Transport problem between PθP_\theta (continuous) and Pˉn\bar P_n (discrete), often via power diagrams or entropic-regularized solvers. The minimization over θ\theta is handled via gradient-based (or even stochastic) optimizers, using the gradient θW22\partial_\theta W_2^2.

No universal algorithm is provided for the high-dimensional case, but the routine is to:

  1. For each candidate θ\theta, compute the optimal transport map (via standard OT algorithms).
  2. Compute W22(Pθ,Pˉn)W_2^2(P_\theta, \bar P_n) and its gradient.
  3. Minimize with respect to θ\theta via standard optimization.

5. Statistical Examples and Empirical Performance

Canonical model cases:

  • Gaussian location: Pθ=N(θ,σ2)P_\theta = N(\theta, \sigma^2), WPE is the sample mean, attaining the exact sensitivity bound $1/n$.
  • Gaussian variance: WPE for (E[X],E[X2])(\mathbb{E}[X], \mathbb{E}[X^2]) yields the unbiased sample variance via the delta method.
  • Uniform scale: Various estimators can be compared:
    • MLE (maxXi\max X_i): sensitivity O(1)O(1).
    • Best linear unbiased estimator (BLE, 22 \cdot mean): sensitivity $4/n$.
    • WPE (LL-statistic on order statistics): sensitivity $3/n$ (best among unbiased estimators).

Monte Carlo simulations (see Figure 1, (Trillos et al., 10 Nov 2025)) confirm that the WPE exhibits minimal variance and sensitivity constants compared to classical alternatives over a range of models.

6. Exact Versus Asymptotic Efficiency and Model Structure

  • Exact efficiency (finite-n attainment of the lower bound) occurs if and only if the model is a "transport family": the transport linearization takes the form

Φθ(x)=Dϕ(x)Λ(θ)1(Dχ(θ)),\Phi_\theta(x) = D\phi(x) \Lambda(\theta)^{-1} (D\chi(\theta))^\top,

and the statistic ϕ(X)\phi(X) has mean χ(θ)\chi(\theta). Families with location, scale, Gaussian first-two moments, some regressions, and Pareto models fall into this category.

  • For general models, the WPE does not attain the bound at finite nn, but is always asymptotically efficient under DWS-type regularity.
  • Regularity conditions for asymptotic results include: weak continuity and smoothness of the inverse CDFs in 1D, and boundedness, uniform C² density smoothness, and control over dual potential derivatives and Laguerre cell integrals in d1d\geq 1.

7. Interpretation and Broader Implications

The Wasserstein projection estimator encodes a broader principle: projection in optimal transport geometry yields sensitivity-efficient analogues to classical estimators in the Fisher–Rao setting. This yields estimators with improved robustness properties, especially in non-Gaussian, heavy-tailed, or contaminated data regimes, where sensitivity to small data perturbations is critical. In standard models, WPE matches or improves over maximum likelihood in both asymptotic and finite-sample efficiency.

The estimator also admits further extension to models with additional structure (e.g., circular data, manifold constraints, infinite-support measures) and connects directly to contemporary developments in computational optimal transport and empirical process theory (Trillos et al., 10 Nov 2025).


Table: Sensitivity Constants for Three Estimators in Uniform Scale Model

Estimator Sensitivity Attains Bound?
MLE (maxXi\max X_i) O(1)O(1) No
Best Linear Unbiased (BLE) $4/n$ No
WPE (LL-statistic) $3/n$ Yes

This highlights that the WPE provides the minimal possible sensitivity among unbiased estimators in appropriate transport families.


In summary, the Wasserstein projection estimator generalizes traditional statistical projection and MLE concepts into the optimal transport framework, providing sensitivity-optimal and robust estimators across a breadth of models, with rigorous efficiency guarantees and practical computational pathways.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Wasserstein Projection Estimator.