Estimation of Stochastic Optimal Transport Maps

Published 10 Dec 2025 in stat.ML, cs.LG, and math.ST | (2512.09499v1)

Abstract: The optimal transport (OT) map is a geometry-driven transformation between high-dimensional probability distributions which underpins a wide range of tasks in statistics, applied probability, and machine learning. However, existing statistical theory for OT map estimation is quite restricted, hinging on Brenier's theorem (quadratic cost, absolutely continuous source) to guarantee existence and uniqueness of a deterministic OT map, on which various additional regularity assumptions are imposed to obtain quantitative error bounds. In many real-world problems these conditions fail or cannot be certified, in which case optimal transportation is possible only via stochastic maps that can split mass. To broaden the scope of map estimation theory to such settings, this work introduces a novel metric for evaluating the transportation quality of stochastic maps. Under this metric, we develop computationally efficient map estimators with near-optimal finite-sample risk bounds, subject to easy-to-verify minimal assumptions. Our analysis further accommodates common forms of adversarial sample contamination, yielding estimators with robust estimation guarantees. Empirical experiments are provided which validate our theory and demonstrate the utility of the proposed framework in settings where existing theory fails. These contributions constitute the first general-purpose theory for map estimation, compatible with a wide spectrum of real-world applications where optimal transport may be intrinsically stochastic.

Abstract PDF Upgrade to Chat

Summary

The paper introduces the novel E_p error metric, quantifying both optimality and feasibility gaps in stochastic optimal transport map estimation.
It presents finite-sample estimators, including entropic kernel and rounding methods, which achieve near-minimax statistical rates under minimal assumptions.
The framework ensures computational efficiency and robustness against adversarial corruptions, expanding optimal transport applicability in complex settings.

Estimation of Stochastic Optimal Transport Maps: Technical Summary

Introduction and Motivation

Optimal transport (OT) underpins a broad spectrum of applications in statistics, probability, and machine learning, providing a principled mechanism for mapping one probability distribution to another with minimal cost, typically measured via $p$ -Wasserstein metrics. Classical OT map estimation theory relies on Brenier's theorem, which requires a quadratic cost and an absolutely continuous source measure to guarantee existence and uniqueness of deterministic OT maps. However, these restrictive assumptions rarely hold in practical settings such as domain adaptation, single-cell genomics, and image-to-text translation, where source and target distributions may lie on fundamentally distinct supports or manifolds, and optimal transport may inherently be stochastic.

This paper develops a comprehensive framework for the statistical estimation of (possibly stochastic) OT maps without recourse to traditional regularity or uniqueness conditions. It introduces a new transportation error metric $\mathcal{E}_p$ that robustly evaluates stochastic OT map performance, delivers statistically near-optimal estimators under minimal assumptions, and establishes strong robustness guarantees to adversarial sample contamination.

The $\mathcal{E}_p$ Error Metric and Its Properties

The novel transportation error metric $\mathcal{E}_p$ captures two sources of error for a Markov kernel $\kappa$ :

Optimality gap: The excess cost incurred above the optimal Wasserstein cost, i.e., $\left(\iint \|x-y\|^p\, d\kappa_x(y)\,d\mu(x)\right)^{1/p} - \mathsf{W}_p(\mu,\nu)$ .
Feasibility gap: The distance between the pushforward of the source distribution under the kernel and the target, i.e., $\mathsf{W}_p(\kappa_\sharp \mu,\nu)$ .

By combining these, $\mathcal{E}_p$ seamlessly generalizes the $L^p$ metric for deterministic maps and also evaluates stochastic kernels even when deterministic OT maps do not exist. This construction enables formal statistical analysis in broad settings outside the purview of classical OT theory.

Figure 1: Diagrams depicting two deterministic maps and a stochastic kernel for $\mathsf{W}_p(\mu,\nu)$ under non-unique or non-existent optimal maps.

Key properties analyzed include:

Stability under perturbations: $\mathcal{E}_p$ admits quantitative bounds under total variation (TV) and Wasserstein changes in source and target measures.
Compositionality: Error propagation through kernel compositions is controlled.
Reduction to $L^p$ : For deterministic maps, $\mathcal{E}_p$ is comparable to $L^p$ error against the true OT map.
Relation to Monge gap objectives: A formal connection to Monge gap regularizers used in neural OT map estimation is established and compared quantitatively.

Finite-Sample Kernel Estimation and Computational Guarantees

Two primary finite-sample estimators are presented and rigorously analyzed:

Entropic Kernel Estimator: Leveraging entropic OT (EOT) with empirical measures, the estimator achieves $\mathbb{E}[\mathcal{E}_p] = \widetilde{O}_{p,d}(n^{-1/(2pd \lor 4p)})$ when $\nu$ is sub-Gaussian and $\mu$ has bounded moments.
Rounding Estimator: By partitioning the source space and rounding empirical samples, a sharp bound of $\mathbb{E}[\mathcal{E}_p] = \widetilde{O}_{p,d}(n^{-1/(d+2p)})$ is obtained under minimal moment assumptions, approaching the minimax lower bound $\Omega(n^{-1/(d \lor 2p)})$ .

The rounding approach is computationally efficient, requiring $O(n^{2 + o_d(1)})$ time for the OT subproblem. The empirical performance and theoretical rates outperform prior methods where regular deterministic OT maps do not exist.

Figure 2: Empirical $\mathcal{E}_1$ and $L^1$ performance of nearest-neighbor and rounding estimators in settings where classical theory fails.

Statistical Complexity with H\"older Continuous Kernels

For settings admitting optimal kernels that are H\"older continuous (not necessarily deterministic or smooth), the analysis yields information-theoretic reductions: optimal statistical rates for kernel estimation under $\mathcal{E}_p$ are equivalent to those for estimating the underlying measures under Wasserstein distance. Plug-in and wavelet-based estimators achieve minimax rates under this relaxed regularity.

Robust Estimation Under Adversarial Corruptions

The framework robustly accommodates a corruption model combining classical TV outliers and local Wasserstein perturbations. The convolution-based estimator matches lower bounds up to constants: any estimator must incur error $\Omega(\sqrt{d}\epsilon^{1/p} + d^{1/4}\rho^{1/2} + n^{-1/(d \lor 2p)})$ , separating the complexity of robust OT map estimation from distributional estimation. An efficient implementation is provided for practical use.

Experiments and Empirical Validation

Synthetic experiments reveal dramatic benefits of the $\mathcal{E}_p$ metric and rounding estimator in high-dimensional and non-regular OT map settings. Nearest-neighbor estimators empirically perform well in low dimensions, but provable guarantees are only available for the rounding estimator in pathological cases.

Figure 3: $\mathcal{E}_1$ , optimality gap, and feasibility gap of nearest-neighbor and rounding estimators for irregular OT maps (Setting A).

Figure 4: Visualization of the rounding kernel estimator on a two-dimensional checkerboard dataset.

Practical and Theoretical Implications

The proposed framework delivers:

General-purpose OT map estimation: Compatible with stochastic mappings, relaxed assumptions on measures and kernels, and arbitrary $p$ -Wasserstein costs.
Efficiency and robustness: Practical computational complexity and resilience to adversarial data corruption.
Statistical optimality: Achieves near-minimax rates with straightforward estimators.

Potential future work includes multi-scale kernel estimation to further tighten risk bounds, designing efficient Lipschitz kernel estimators at minimax rates, and application to generalized OT contexts (e.g., conditional, weak, or entropy-regularized OT).

Conclusion

This work substantially enlarges the practical applicability and theoretical foundations of OT map estimation, providing a robust, efficient, and general methodology for estimating stochastic optimal transport maps. The technical guarantees for the introduced $\mathcal{E}_p$ metric validate its utility as an evaluation criterion, and empirical experiments substantiate theoretical findings. Future directions involve exploration of multi-scale and computationally efficient methods, extending the approach to further generalized and structured OT frameworks.