Optimal-Transport DRO

Updated 17 September 2025

OT-DRO is a mathematical framework that defines an ambiguity set using Wasserstein distances to protect decisions against worst-case shifts in data distribution.
Its dual reformulation transforms infinite-dimensional problems into tractable convex programs, enabling practical and scalable robust optimization.
Structured ambiguity sets and careful statistical calibration underpin improved performance in applications such as high-dimensional learning, supply-chain, and engineering design.

Optimal-Transport Distributionally Robust Optimization (OT-DRO) is a mathematical framework for decision-making under uncertainty, in which the solution is protected against worst-case shifts of the data distribution within a set defined by optimal transport metrics such as Wasserstein distances. This paradigm has seen wide adoption across statistics, machine learning, operations research, and robust engineering design, as it enables rigorous quantification and control of distributional uncertainty using the geometric structure of the probability space.

1. Formulation and Ambiguity Set Construction

At the core of OT-DRO is the definition of an ambiguity set: the collection of plausible probability distributions around a nominal (e.g., empirical) estimate. The typical OT-DRO problem is given by

$\min_{x \in X} \sup_{Q \in \mathcal{U}(\hat{P}_N)} \mathbb{E}_Q [f(x, \xi)]$

where $\hat{P}_N$ is the empirical distribution based on data, and $\mathcal{U}(\hat{P}_N)$ is a Wasserstein "ball" of radius $r$ around $\hat{P}_N$ : $\mathcal{U}(\hat{P}_N) = \{ Q : \mathcal{W}_c(Q, \hat{P}_N) \leq r \}$ The optimal transport cost $\mathcal{W}_c(P, Q)$ is defined as

$\inf_{\pi \in \Pi(P, Q)} \mathbb{E}_\pi[c(\xi, \xi')]$

with cost function $c$ (typically a norm or squared norm) and $\Pi(P, Q)$ the set of couplings of $P$ and $Q$ .

Extensions partition the support of $\xi$ and impose separate transport constraints over each partition, allowing the ambiguity set to encode additional knowledge (e.g., independence of components) (Chaouach et al., 2023, Chaouach et al., 9 Apr 2025), or even impose order cone constraints to reflect shape constraints or monotonicity (Esteban-Pérez et al., 2019). This flexibility is a defining strength of the optimal transport approach.

2. Dual Reformulation, Regularization, and Algorithmic Implications

Strong duality plays a central role in OT-DRO, allowing the infinite-dimensional inner supremum over measures to be replaced with a finite-dimensional minimization over dual variables. A generic result for expected loss minimization is

$\sup_{Q: \mathcal{W}_c(\hat{P}_N, Q) \leq r} \mathbb{E}_Q[f(\xi)] = \inf_{\lambda \geq 0} \left\{ \lambda r + \mathbb{E}_{\hat{P}_N}\left[ f_\lambda(\xi) \right]\right\}$

where $f_\lambda(\xi) = \sup_{\xi'}\{ f(\xi') - \lambda c(\xi, \xi') \}$ (Blanchet et al., 2021, Blanchet et al., 2018).

For learning problems, e.g. linear/logistic regression where $f(\xi')$ is the loss with respect to parameter $\beta$ , the dual reformulation often recovers standard regularized estimators: the Wasserstein constraint induces a dual norm penalty on $\beta$ ; see

$\min_{\beta} \frac{1}{N} \sum_{i=1}^N l(x_i, y_i; \beta) + r \|\beta\|_p$

with $p$ depending on the metric in $c$ (Blanchet et al., 26 Jan 2024, Blanchet et al., 2017).

Convexity and strong convexity properties are often "inherited" by the dual problem even if the original stochastic problem lacks strong curvature. As a concrete statement, for affine decision rules and locally strongly convex costs, the dual function is strongly convex in the variables $(\beta, \lambda)$ with modulus scaling as $\sqrt{r}$ , which underpins linear convergence rates for SGD (Blanchet et al., 2018). Dual formulations also facilitate scalable stochastic optimization methods, often with per-iteration cost independent of sample size.

3. Extensions: Structure, Shape, and Statistical Guarantees

OT-DRO can exploit known structure in uncertainty to improve statistical efficiency and interpretability. For instance:

Structured ambiguity sets: When uncertainty decomposes into independent components, constructing a "multi-transport hyperrectangle" with independent Wasserstein constraints per component ensures faster finite-sample shrinkage of the ambiguity set. The statistical error decays with the maximum, rather than the sum, of component dimensions (Chaouach et al., 2023, Chaouach et al., 9 Apr 2025).
Order cone constraints: Prior knowledge about the monotonicity or multimodality of the underlying distribution can be imposed through cone constraints on partition probabilities, which are incorporated as linear constraints in the reformulation. This reduces conservatism and the cost of robustness (Esteban-Pérez et al., 2019).
Dimension reduction: The geometry of the support set (e.g., concentration of data on a lower-dimensional manifold) is directly linked to the decay rate of the optimal uncertainty set size, leading to more favorable generalization as the sample size increases (Blanchet et al., 2017). The decay rate of the worst-case radius $\delta^*$ depends on the intrinsic data dimension.

4. Statistical Calibration and Confidence Guarantees

Selection of the ambiguity set size (e.g., Wasserstein ball radius) is nontrivial and is calibrated by statistical principles. The projection distance from the empirical distribution onto the set of measures satisfying first-order optimality conditions is used to derive a "Wasserstein Profile Function," from which the minimal adversarial budget (radius) $r$ is chosen to guarantee coverage of the true minimizer with probability $1-\alpha$ (Blanchet et al., 2021). Explicitly,

$r = \frac{\eta_{1-\alpha}}{n}$

where $\eta_{1-\alpha}$ is the (1- $\alpha$ ) quantile of the profile function's limiting distribution.

The resulting OT-DRO estimators satisfy finite-sample and asymptotic confidence guarantees, sometimes even in high dimensions (i.e., avoiding the curse of dimensionality if structure is exploited). Central limit theorems and coverage properties for compatible confidence regions for the DRO estimator are established in this framework.

5. Computational and Practical Aspects

Practical tractability of OT-DRO stems from the possibility of reformulating the ambiguity-constrained optimization as finite or even decomposable programs. In standard cases (piecewise affine losses, polyhedral uncertainty sets), the worst-case expectation admits a dual representation involving only a modest number of dual variables and convex constraints (Blanchet et al., 2018, Chaouach et al., 2023, Chaouach et al., 9 Apr 2025).

Scalability can be further improved by exploiting product or clustered empirical distributions when the uncertainty is separable—substantially reducing the computational burden relative to the exponential growth in scenario count with dimensionality (Chaouach et al., 2023, Chaouach et al., 9 Apr 2025). Algorithmic advances such as the use of multilevel Monte Carlo for unbiased stochastic gradient estimation and efficient inner maximization via line search or clustering are common.

Numerical experiments—including newsvendor, power dispatch, portfolio optimization, and strategic firm competition problems—consistently demonstrate that OT-DRO with structured ambiguity exhibits lower conservatism, better out-of-sample performance, and improved certificate reliability compared to classical methods.

6. Applications and Theoretical Implications

OT-DRO has been applied in a variety of contexts:

Inventory and supply-chain optimization with shape-informed ambiguity (Esteban-Pérez et al., 2019).
Strategic firm decisions under Cournot competition, where prior order information on probabilities is integrated (Esteban-Pérez et al., 2019).
Engineering design (shape/topology optimization) under uncertain loading, using entropy-regularized Wasserstein balls for computationally tractable robust design (Dapogny et al., 2022).
High-dimensional learning for robust regression, regularization, and semi-supervised settings where the support of the ambiguity set is shaped by unlabeled data (Blanchet et al., 2017, Blanchet et al., 2017).
Statistical estimation and confidence sets for data-driven problems with rigorous nonasymptotic or asymptotic statistical coverage (Blanchet et al., 2021).

These applications leverage the flexibility of the OT ambiguity set, duality-based tractability, and the capacity to incorporate prior qualitative information directly into the uncertainty modeling.

7. Impact and Future Directions

OT-DRO brings together the geometric intuition of optimal transport with the rigorous foundations of robust optimization, enabling a unified treatment for a wide range of robust machine learning and statistical inference problems (Blanchet et al., 26 Jan 2024). Recent extensions include entropic regularization for computational gains (Azizian et al., 2022, Dapogny et al., 2022), sharpness-aware learning in model parameter space (Nguyen et al., 2023), and comprehensive unification with divergence-based approaches under an optimal-transport-theoretic framework (Blanchet et al., 2023).

Open directions include optimal design of ambiguity sets balancing statistical performance and computational tractability (e.g., via adaptive or decision-aware sets), further integration with metric learning, and dynamic or adaptive calibration of uncertainty size. The structural and practical advantages of OT-DRO—particularly with structured or shape-constrained ambiguity sets—offer fertile ground for robust data-driven decision-making in increasingly complex and high-dimensional environments.