Optimal Transport Ambiguity Sets

Updated 15 December 2025

Optimal Transport Ambiguity Sets are families of probability distributions defined via Wasserstein distances and cost functions, capturing both stochastic variability and adversarial uncertainty.
They integrate geometric, statistical, and domain-specific constraints to address high-dimensional, structured uncertainty in robust optimization and control applications.
Recent advances enable tractable dual formulations and loss-aware parameterizations that mitigate conservatism and enhance computational efficiency in decision-making problems.

Optimal transport (OT) ambiguity sets are families of probability distributions defined via Wasserstein distances or general optimal-transport costs from a reference distribution. They are core to modern robust statistics, distributionally robust optimization (DRO), model predictive control (MPC), and partial identification. OT ambiguity sets offer a unifying and expressive mathematical framework for modeling uncertainty, capturing both stochastic variability and adversarial uncertainty, and embedding structural, geometric, or domain-specific information into tractable optimization formulations.

1. Mathematical Formulations of OT Ambiguity Sets

The canonical OT ambiguity set, often called the Wasserstein ball, is

$\mathcal B_\rho^c(\mu_0) = \bigl\{ \mu \in \mathcal P(\mathcal X) : W_c(\mu,\mu_0) \leq \rho \bigr\}$

where $\mathcal P(\mathcal X)$ is the space of Borel probability measures on a Polish space $\mathcal X$ , $c:\mathcal X\times\mathcal X\to [0,\infty)$ is a lower-semicontinuous transportation cost, and $W_c$ is the Kantorovich-Wasserstein distance: $W_c(\mu,\mu_0) = \inf_{\pi\in\Pi(\mu,\mu_0)} \int_{\mathcal X\times\mathcal X} c(x,y)\ d\pi(x,y)$ with $\Pi(\mu,\mu_0)$ the set of couplings with marginals $\mu,\mu_0$ (Aolaritei et al., 2022, Ohnemus et al., 16 Sep 2025, Chaouach et al., 2023).

Classical choices include $c_p(x,y) = \|x-y\|^p$ for $p\ge 1$ , yielding $p$ -Wasserstein balls. Empirical distributions (e.g., $P^N = \frac{1}{N}\sum_{i=1}^N \delta_{\xi^i}$ ) serve as the reference $\mu_0$ in data-driven settings.

Partition-based ambiguity sets extend this structure:

$\U_{\rho,\epsilon}(\widehat Q) = \left\{ Q = \sum_i p_i Q_i :\ p \in \mathcal P,\, \widetilde c(p, \widehat p) \leq \rho,\, \sum_i p_i\, \Wass(Q_i, \widehat Q_i) \leq \epsilon \right\}$

where $\mathcal P$ is determined by order-cone constraints and $\widetilde c$ is a divergence or norm on class probabilities (Esteban-Pérez et al., 2019).

Structured OT sets further generalize via multi-transport hyperrectangles: $\mathcal T_p(Q, \epsilon) = \left\{ P = \mathrm{pr}_{2\#} \pi : \pi \in \mathcal P(\Xi\times\Xi),\ \mathrm{pr}_{1\#}\pi = Q, \int \rho_k(\zeta_k, \xi_k)^p d\pi(\zeta, \xi) \leq \epsilon_k^p,\ \forall k \right\}$ capturing independent or structured uncertainty (Chaouach et al., 2023, Chaouach et al., 9 Apr 2025).

These constructions are foundational for expressing distributional uncertainty in high-dimensional or structured spaces and for incorporating side information or expert knowledge (e.g., ordering, modality) through constraints on the feasible set.

2. Geometric and Statistical Properties

OT ambiguity sets possess rich geometric and statistical properties:

Monotonicity: If $\rho_1 \leq \rho_2$ , then $\mathcal B_{\rho_1}^c(\mu_0) \subseteq \mathcal B_{\rho_2}^c(\mu_0)$ (Aolaritei et al., 2022).
Compactness and Convexity: Wasserstein balls are compact if the underlying metric space is compact; convexity holds under strict convexity of the ground cost (Baheri, 2023).
Statistical Coverage: For $N$ i.i.d. samples and light-tailed data, it holds that $\Pr[ W_c(Q^*, \widehat Q) \leq \epsilon_N ] \geq 1-\beta$ for suitable $\epsilon_N = O(N^{-1/d})$ (Esteban-Pérez et al., 2019, Chaouach et al., 2023).
Dimensionality Effects: The “curse of dimensionality” manifests as a slow rate $N^{-1/d}$ , whereas structured (e.g., coordinate-wise) sets achieve $N^{-1/d_{\max}}$ rates when components $d_k$ are small, thus mitigating over-conservatism in high dimension (Chaouach et al., 2023, Chaouach et al., 9 Apr 2025).
Structural Information: OT sets can encode independence, block structure, or expert knowledge (e.g., monotonicity via order-cones (Esteban-Pérez et al., 2019)), offering sharper and often less conservative uncertainty quantification.

A key property is the closure under transformations: for measurable $f$ , $f_\# (\mathcal B_\rho^c(\mu_0)) \subseteq \mathcal B_\rho^{d}(f_\#\mu_0)$ for suitable $d$ induced by $f$ , leading to analytical tractability in control and inference (Aolaritei et al., 2022, Wu et al., 2022).

3. Duality, Reformulations, and Regularization

Most OT-ambiguity DRO problems admit strong dual characterizations, enabling tractable finite-dimensional reformulations for a broad class of loss functions. Typical dual forms involve maximization over Lagrange multipliers (for Wasserstein and constraint budgets $\lambda$ , $\theta$ ) and, when partition constraints or order cones are present, over their dual cones (Esteban-Pérez et al., 2019, Chaouach et al., 2023, Chaouach et al., 9 Apr 2025).

Regularization of OT programs is tightly linked to ambiguity in the cost function. For example, in semi-discrete OT, smoothing the dual objective via worst-case additive disturbances $z_i$ sampled from an ambiguity set $\Theta$ over noise yields primal regularizers $R_\Theta(\pi)$ based on $f$ -divergences between the plan $\pi$ and the independent product measure (Taskesen et al., 2021).

The following table summarizes common ambiguity families and induced regularization:

Ambiguity Set on $z$	Dual Smoothing	Primal Regularizer
Singleton Gumbel / Entropic	log-sum-exp (“softmax”)	KL divergence ( $\lambda$ -entropy)
Chebyshev Moment (zero mean, var)	quadratic or $\chi^2$ smoothing	Quadratic/ $\chi^2$ penalty
Fréchet (marginals fixed, copula free)	Mixtures (e.g., Tsallis)	$f$ -divergence, generalized entropy

These connections unify perspectives from regularized OT, discrete choice models, and robust estimation (Taskesen et al., 2021).

4. Structural and Decision-Aware OT Ambiguity Sets

To mitigate conservatism and computational intractability, OT ambiguity sets can be structured, decision-aware, or trainably parameterized:

Structured Sets: Hyperrectangles and multi-transport sets, defined via per-component Wasserstein radii, capture variable-wise uncertainty and enable improved statistical rates (Chaouach et al., 2023, Chaouach et al., 9 Apr 2025).
Order-Constrained Sets: Partition-based sets incorporate prior or expert information via polyhedral cones, e.g., monotonicity, strict ordering, or known probability ratios, realizable as linear constraints in the dual (Esteban-Pérez et al., 2019).
Loss-Aware Shaping: Recent frameworks learn the shape of the ground metric or the cost function $\kappa(\cdot,\cdot;\theta)$ via bilevel optimization, so that the OT ambiguity set is sensitive to downstream loss or risk function structure, reducing conservatism without sacrificing statistical coverage (Ohnemus et al., 16 Sep 2025).
Clustering: For high-dimensional product ambiguity sets, clustering reduces the size of the empirical center and complexity, while adjusting ambiguity radii to preserve coverage (Chaouach et al., 9 Apr 2025).

These approaches yield a family of families: OT ambiguity sets can be adaptively tailored to data geometry, loss sensitivity, independence structure, and domain constraints.

5. Applications in Distributionally Robust Optimization and Control

OT ambiguity sets underpin the modern DRO paradigm: $\inf_{x\in X} \sup_{Q \in \mathcal A} \E_Q[f(x,\xi)]$ with $\mathcal A$ an OT ambiguity set. Applications and problem classes include:

Stochastic Programming: Newsvendor, portfolio optimization, linear and quadratic programming (Ohnemus et al., 16 Sep 2025, Esteban-Pérez et al., 2019).
Control and MPC: Distributionally robust model predictive control (MPC) propagates ambiguity sets through nonlinear dynamics using Lipschitz constants, preserving set inclusions and supporting robust stability proofs via supermartingale Lyapunov arguments (Wu et al., 2022, Aolaritei et al., 2022).
Chance Constraints: Distributionally robust chance constraints are reformulated via CVaR and conic programming, with explicit dual representations for general piecewise-concave or affine losses (Chaouach et al., 9 Apr 2025).
Partial Identification: In econometrics and fairness analysis, partial identification sets for parameters under moment models with incomplete data admit OT-based characterizations with tractable support functions (Fan et al., 20 Mar 2025).
Inverse Reinforcement Learning: Reward ambiguity in IRL is expressed as a Wasserstein ball over the space of reward functions, with compactness and convexity properties supporting robust learning strategies and centroid computation (Baheri, 2023).

These settings exploit the tractability of OT ambiguity sets: many variants yield finite convex programs scalable with problem dimension, sample size, and partition structure.

6. Computational Aspects and Rates of Convergence

Computational considerations for OT ambiguity sets vary by structural assumptions:

Wasserstein Ball: For Wasserstein balls centered at empirical distributions, the worst-case DRO yields conic or linear programs. However, in high dimensions the sample complexity and computation scales poorly ( $O(N^{-1/d})$ convergence rate), revealing the curse of dimensionality (Chaouach et al., 2023, Esteban-Pérez et al., 2019).
Structured Sets: By breaking ambiguity into low-dimensional factors (e.g., hyperrectangles), convergence accelerates to $O(N^{-1/d_{\max}})$ , and optimization decomposes into parallel low-dimensional problems (Chaouach et al., 2023, Chaouach et al., 9 Apr 2025).
Regularization and Smoothing: Regularized semi-discrete OT via ambiguity in the dual penalty (e.g., entropic, quadratic, Tsallis) yields smooth objectives allowing fast convergence via SGD, with rates $O(1/T)$ for self-concordant cases (Taskesen et al., 2021).
Clustering and Quantization: Clustering empirical reference measures for structured sets (e.g., $k$ -means) reduces program size, with careful adjustment of ambiguity radii to maintain statistical guarantees (Chaouach et al., 9 Apr 2025).
Loss-Aware Parameterization: Bilevel learning of ambiguity metrics via hypergradient descent, relying on implicit function theorems, supports end-to-end optimization of both set shape and decision variables (Ohnemus et al., 16 Sep 2025).

A plausible implication is that, when structural properties or data-driven geometry are exploited, OT ambiguity sets enable scalable, less conservative, and interpretable uncertainty quantification in otherwise high-dimensional, sample-poor regimes.

7. Outlook and Research Directions

OT ambiguity sets constitute a central mathematical object in contemporary robust optimization and control. Key open avenues include

Distributionally robust filtering and Bayesian inference via OT ambiguity propagation (Aolaritei et al., 2022).
Characterization of set invariance and stability in dynamical systems under propagated OT uncertainties (Wu et al., 2022).
Efficient algorithms for semi-discrete OT under new families of regularizations, and their connections to discrete choice models (Taskesen et al., 2021).
Systematic incorporation of domain-informed constraints (e.g., via order-cones, block structure) and data-driven, loss-aware shaping of ambiguity sets (Esteban-Pérez et al., 2019, Ohnemus et al., 16 Sep 2025).
Full empirical evaluation and benchmarking for robust inverse reinforcement learning, control, and econometric identification (Baheri, 2023, Fan et al., 20 Mar 2025).

As advanced by recent research, the flexibility, geometric insight, and statistical fidelity of OT-based ambiguity models establish them as a cornerstone for high-dimensional, decision-centric, and uncertainty-aware optimization.