Papers
Topics
Authors
Recent
Search
2000 character limit reached

Statistical Optimal Transport

Updated 5 February 2026
  • Statistical optimal transport is a framework for estimating optimal transport distances and maps from finite sample data, integrating inference, convex analysis, and empirical process theory.
  • Methodologies such as plug-in estimators, semidual approaches, and entropic regularization overcome high-dimensional challenges and achieve dimension-free convergence rates.
  • Applications span machine learning, generative modeling, domain adaptation, and robust inference, with rigorous performance guarantees and uncertainty quantification.

Statistical optimal transport (SOT) is the field concerned with inference, estimation, and computational analysis of optimal transport (OT) quantities—distances, couplings, and maps—given finite samples from unknown probability distributions. SOT lies at the intersection of mathematical statistics, high-dimensional probability, convex analysis, empirical process theory, and computational mathematics, with applications spanning machine learning, biology, generative modeling, and information geometry. The central statistical problem is to consistently and efficiently recover population-level OT functionals (e.g., Wasserstein distances or Monge maps) based solely on sample data, and to provide rigorous uncertainty quantification, rates of convergence, and efficient algorithms adapted to high-dimensional or structured regimes.

1. Formulations and Theoretical Foundations

Let μ,ν\mu, \nu be Borel probability measures on Rd\mathbb{R}^d and c(x,y)c(x,y) a lower semicontinuous cost (commonly, c(x,y)=xypc(x,y) = \|x - y\|^p for p1p \ge 1).

  • Monge Problem: Find a measurable map T:RdRdT : \mathbb{R}^d \to \mathbb{R}^d with T#μ=νT_{\#}\mu = \nu minimizing c(x,T(x))dμ(x)\int c(x, T(x))\, d\mu(x).
  • Kantorovich Problem: Minimize c(x,y)dγ(x,y)\int c(x, y)\, d\gamma(x,y) over couplings γ\gamma with marginals μ\mu, ν\nu. This form always admits a solution and defines the pp-Wasserstein distance:

Wp(μ,ν)=(infγΓ(μ,ν)xypdγ(x,y))1/p.W_p(\mu, \nu) = \left(\inf_{\gamma \in \Gamma(\mu, \nu)} \iint \|x-y\|^p\, d\gamma(x,y)\right)^{1/p}.

  • Duality: The dual problem involves maximizing fdμ+gdν\int f\,d\mu + \int g\,d\nu subject to f(x)+g(y)c(x,y)f(x) + g(y) \leq c(x, y), with f,gf, g in suitable function spaces. The optimal map exists and equals the gradient of a convex function (the Brenier map) when p=2p=2 and μ\mu is absolutely continuous (Chewi et al., 2024, Balakrishnan et al., 23 Jun 2025).

Statistical optimal transport concerns itself with estimating these quantities using empirical measures μ^n=1ni=1nδXi,ν^n=1ni=1nδYi\hat{\mu}_n = \frac{1}{n}\sum_{i=1}^n \delta_{X_i},\, \hat{\nu}_n = \frac{1}{n}\sum_{i=1}^n \delta_{Y_i} from i.i.d. samples.

2. Statistical Estimation and Computational Methodologies

2.1 Plug-in and Semidual Approaches

The plug-in estimator replaces μ\mu and ν\nu by their empirical distributions in the OT problem, yielding

W^p=Wp(μ^n,ν^n),\widehat{W}_p = W_p(\hat{\mu}_n, \hat{\nu}_n),

and analogously for couplings and maps (Chewi et al., 2024, Balakrishnan et al., 23 Jun 2025). However, in high dimensions, this approach suffers from the classical curse of dimensionality, with convergence rates n1/d\sim n^{-1/d} for d3d \ge 3 (Forrow et al., 2018, Ding et al., 2024).

The semidual approach estimates the Kantorovich potential by solving the empirical dual

(f^,g^)=argmaxf,g 1nif(Xi)+1njg(Yj) subject to f(x)+g(y)c(x,y).(\hat f, \hat g) = \arg\max_{f,g}\ \frac{1}{n}\sum_{i}f(X_i) + \frac{1}{n}\sum_j g(Y_j) \text{ subject to } f(x) + g(y) \leq c(x,y).

The estimated OT map is then T^=φ^\widehat{T} = \nabla\hat\varphi, provided f^=22φ^\hat f = \|\,\cdot\,\|^2 - 2\hat\varphi (Chewi et al., 2024, Balakrishnan et al., 17 Feb 2025).

2.2 Regularization and Dimension-Free Methods

Due to limited scalability, several regularized and structural approaches have been proposed:

  • Entropic Regularization: The entropic OT problem adds an entropy penalty εKL(γμν)\varepsilon \, \mathrm{KL}(\gamma \| \mu \otimes \nu), leading to computationally tractable Sinkhorn algorithms and dimension-independent statistical rates for fixed ε>0\varepsilon > 0 (Goldfeld et al., 2022, Chewi et al., 2024).
  • Factored Couplings/Transport Rank: Low-rank structure is imposed on couplings; the FactoredOT algorithm constructs transport plans with low transport rank via alternating minimization over cluster centers and transport plans using entropic Sinkhorn regularization. This breaks the n1/dn^{-1/d} curse, achieving parametric rates (k3dlogk)/n\sqrt{(k^3 d \log k)/n} dependent only on the transport rank kk, rather than the ambient dimension (Forrow et al., 2018).
  • Kernel Mean Embedding: OT is reformulated as learning a kernel mean embedding of the transport plan, regularized by the maximum mean discrepancy (MMD), yielding dimension-free sample complexity O(1/n)O(1/\sqrt{n}) (Nath et al., 2020).
  • RKHS/Infinite-Dimensional SOS Methods: For smooth densities, sum-of-squares representations and kernel embeddings yield estimators with sample and time exponents independent of the dimension for sufficiently high smoothness, circumventing the curse at the expense of exponentially large constants (Vacher et al., 2021).

2.3 Neural and Adversarial OT Solvers

Dual potentials and transport maps are parameterized using input-convex neural networks (ICNNs). Adversarial or minimax (semi-dual) neural optimizations provide end-to-end statistical guarantees for the learned map, with errors controlled by Rademacher complexity and architecture capacity. Convergence rates of O(1/n)O(1/\sqrt{n}) plus approximation bias are obtained under strong convexity and boundedness (Tarasov et al., 3 Feb 2025, Ding et al., 2024).

3. Convergence Rates, Stability, and Minimax Theory

The convergence rates depend critically on the regularity of the underlying distributions, the complexity of the function class, and imposed structure:

  • General Distributions: Plug-in estimators for W1W_1 satisfy EW1(μ^n,μ)n1/d\mathbb{E} W_1(\hat\mu_n, \mu) \asymp n^{-1/d} for d3d \ge 3, and similar dimension-dependent rates for WpW_p (Chewi et al., 2024).
  • Smooth Densities / Log-Concave: If both pp and qq are smooth and supported on convex bodies, minimax-optimal rates for the L2L^2-risk of OT map estimation are n2/dn^{-2/d} (high dd) (Balakrishnan et al., 17 Feb 2025), while for density estimation, faster n2(s+1)/(2s+d)n^{-2(s+1)/(2s + d)} rates are achievable when densities are CsC^s (Balakrishnan et al., 23 Jun 2025).
  • Local Poincaré Inequalities: Novel local Poincaré-type inequalities allow variance control for differences of smooth potentials under only local density and mild topological conditions, giving parametric or near-parametric rates in Donsker regimes for broad function classes (Ding et al., 2024).
  • Low Transport Rank: Estimation errors depend only on the transport rank kk and not directly on dimension, e.g., O(k3dlogk/n)O(\sqrt{k^3 d \log k / n}) (Forrow et al., 2018).
  • Entropic and Sliced OT: Regularization (entropic, slicing, kernel smoothing) yields dimension-free n1/2n^{-1/2} rates in many settings (Goldfeld et al., 2022).

Non-asymptotic stability bounds further show that estimation error in the OT map can be upper-bounded by a function of the W2W_2 errors of the input distributions, plus moment and smoothness constants (Balakrishnan et al., 17 Feb 2025, Ding et al., 2024, Balakrishnan et al., 23 Jun 2025).

4. Distributional Limit Theory, Inference, and Robustness

Classical and recent results provide precise distributional descriptions for OT functionals:

  • Central Limit Theorems: For one-dimensional Wasserstein distances, n(Wp(μ^n,Q)Wp(μ,Q))\sqrt{n} (W_p(\hat\mu_n, Q) - W_p(\mu, Q)) converges to a Gaussian law under mild moment and smoothness conditions (Barrio et al., 25 May 2025, Ponnoprat et al., 2023). In higher dimensions, the limiting variance is given by the variance of the Kantorovich potential evaluated at PP.
  • Non-Gaussian Limits: For discrete or semi-discrete distributions, directional Hadamard differentiability yields limit distributions that may not be Gaussian (Sadhu et al., 2023).
  • Bootstrap and Confidence Bands: Uniform confidence bands for OT maps on the real line and robust bootstrap procedures (e.g., mm-out-of-nn bootstrap for directionally Hadamard differentiable cases) have been developed and theoretically validated (Ponnoprat et al., 2023, Hundrieser et al., 2021).
  • Robust and Outlier-Resistant OT: The ε\varepsilon-outlier-robust Wasserstein distance allows for trimming of distributions, yielding minimax-optimal rates under Huber-type contamination and practical dual forms for robust inference (Nietert et al., 2021).

5.1 Generalizations of Statistical OT

  • Statistical Manifold Embeddings: OT with cumulant-generating costs induces a geometry on the space of probability distributions, connecting OT with information geometry of exponential families (Pal, 2017).
  • Chain-Rule Optimal Transport (CROT): OT distances defined on marginals and conditionals yield metrics on mixture models, upper-bounding ff-divergences in mixture families, and supporting fast Sinkhorn-type computation for hierarchical OT problems (Nielsen et al., 2018).
  • Transport Dependency: Transport dependency and its normalized forms provide statistically consistent and flexible correlation-like dependence measures with formal properties analogous to distance correlation, adaptive to the intrinsic metric structure (Nies et al., 2021).

5.2 Applications

  • Domain Adaptation: Low-rank statistical OT has demonstrated superior accuracy in transferring biological labels across single-cell RNA-seq protocols (Forrow et al., 2018).
  • Generative Modeling: Adversarial neural OT solvers align complex data distributions and enable sample-efficient generative models (Tarasov et al., 3 Feb 2025).
  • Dependency and Graphical Modeling: Transport-based dependency coefficients allow general-purpose, high-power independence testing and network construction in genomics (Nies et al., 2021).
  • Shape and Mixture Comparison: CROT and related composite OT metrics support principled comparison of complex mixtures and learning simplified Gaussian mixture models (Nielsen et al., 2018).
  • Robust Generative Architectures: Plugging robust OT distances into Wasserstein GANs and related architectures yields resilience to contamination without extensive hyperparameter tuning (Nietert et al., 2021).

6. Open Problems and Future Directions

While the theoretical and practical landscape of statistical optimal transport has advanced substantially, several challenging avenues remain:

  • Curse of Dimensionality: Characterizing settings where smoothing, low-dimensional structure, or regularization defeat the n1/dn^{-1/d} curse.
  • Nonconvex and Nonquadratic Costs: Extending limit theory for maps, plans, and costs to general ground costs and non-Euclidean geometries (Balakrishnan et al., 23 Jun 2025).
  • General Weak Convergence: Process-level CLTs for OT maps and potentials in high dimensions are not available outside of the most regular settings.
  • Adaptive and Data-Driven Tuning: Optimal, principled selection of smoothing or regularization parameters in Sinkhorn, kernel-smoothed, or sliced OT.
  • Unified Computational–Statistical Analysis: The joint behavior of estimation error and iteration complexity for scalable OT algorithms across sample sizes and problem parameters remains an open field.
  • Inference with Dependent Data and Time Series: Extending SOT to dependent samples, time-varying distributions, or online/sequential data streams.

Statistical optimal transport thus stands as a central field in modern data-driven mathematics, integrating deep convex-analytic, probabilistic, and algorithmic concepts, and serving as a foundation for robust inference, machine learning, and scientific modeling in high-dimensional and structured domains (Chewi et al., 2024, Balakrishnan et al., 23 Jun 2025, Barrio et al., 25 May 2025, Forrow et al., 2018, Ding et al., 2024).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Statistical Optimal Transport.