Papers
Topics
Authors
Recent
2000 character limit reached

Wasserstein Balls & Robust Optimization

Updated 15 December 2025
  • Wasserstein balls are sets of probability distributions within a fixed optimal transport distance, used to model ambiguity in robust optimization.
  • They enable precise dual reformulations, facilitating tractable solutions in distributionally robust optimization and uncertainty quantification.
  • Key distinctions between 1- and 2-Wasserstein balls highlight trade-offs in computational complexity and smoothness in robust decision-making.

A Wasserstein ball is a set of probability distributions within a fixed Wasserstein (optimal transport) distance from a reference (typically empirical) distribution. The use of Wasserstein balls as model ambiguity sets underpins much of modern data-driven distributionally robust optimization (DRO), statistical learning, and uncertainty quantification. The structure, geometry, and computational properties of Wasserstein balls have been analyzed extensively, with particular attention to their dual reformulations, optimization theory, algorithmic tractability, and statistical guarantees. Key distinctions arise between 1-Wasserstein and 2-Wasserstein balls, impacting practical decision-making and optimization under uncertainty.

1. Mathematical Definition and Notation

Given a Polish metric space (Ξ,d)(\Xi, d), let P(Ξ)\mathcal P(\Xi) denote the set of Borel probability measures on Ξ\Xi. For p1p \geq 1, the pp-Wasserstein distance between P,QP(Ξ)\mathbb P, \mathbb Q \in \mathcal P(\Xi) is

Wp(P,Q)=(infπΠ(P,Q)Ξ×Ξd(ξ,ζ)pdπ(ξ,ζ))1/p,W_p(\mathbb P, \mathbb Q) = \left( \inf_{\pi \in \Pi(\mathbb P, \mathbb Q)} \int_{\Xi \times \Xi} d(\xi, \zeta)^p\, d\pi(\xi, \zeta) \right)^{1/p},

where Π(P,Q)\Pi(\mathbb P, \mathbb Q) is the set of all couplings with marginals P\mathbb P and Q\mathbb Q. The associated Wasserstein pp-ball of radius ρ\rho centered at reference P0\mathbb P_0 is

Bp(P0,ρ)={Q:Wp(Q,P0)ρ}.B_p(\mathbb P_0, \rho) = \left\{ \mathbb Q : W_p(\mathbb Q, \mathbb P_0) \leq \rho \right\}.

When P0\mathbb P_0 is empirical (e.g., P0=1Ni=1Nδζi\mathbb P_0 = \frac{1}{N} \sum_{i=1}^N \delta_{\zeta^i}), ambiguity sets of this form are widely used for robustification in stochastic programming and machine learning (Byeon et al., 2022, Yue et al., 2020).

2. Duality and Reformulations in Optimization

A central result for 1-Wasserstein balls is the exact convex reformulation—by strong duality—of worst-case expectations of Lipschitz or convex functions over the ball. For a cost function f(x,ξ)f(x, \xi), one has

supPB1(P0,ρ)EP[f(x,ξ)]=infλ0{λρ+EP0[supξΞ(f(x,ξ)λd(ξ,ζ))]},\sup_{\mathbb P \in B_1(\mathbb P_0, \rho)} \mathbb E_{\mathbb P}[f(x, \xi)] = \inf_{\lambda \geq 0} \left\{ \lambda \rho + \mathbb E_{\mathbb P_0} \left[ \sup_{\xi \in \Xi} (f(x,\xi) - \lambda d(\xi, \zeta)) \right] \right\},

with analogous versions for empirical centers (Byeon et al., 2022, Yue et al., 2020).

For p=2p=2, the analogous result deploys a quadratic penalization, often leading to copositive or semidefinite programming formulations in two-stage DRO (Hanasusanto et al., 2016, Byeon, 9 Jan 2025). The reformulation for 2-Wasserstein balls enables smoother dependence of solutions on the radius parameter and typically leads to more informative robust solutions in the presence of nonlinear recourse (Byeon, 9 Jan 2025).

3. Structure and Geometry of Wasserstein Balls

The geometric structure of Wasserstein balls depends strongly on the underlying metric and the value of pp.

  • For discrete spaces, the unit Wasserstein ball is the polar of the associated Lipschitz polytope, with boundaries determined by the geometry of the ground metric graph (Çelik et al., 2020).
  • In Rd\mathbb R^d, Wasserstein balls are convex, weakly compact sets under mild moment and separability conditions (Yue et al., 2020, Zyl, 2019). If centered at a discrete measure with NN atoms, any maximizer of a linear objective over the Wasserstein ball is supported on at most N+1N+1 atoms (Yue et al., 2020).
  • For p=2p=2 and Gaussian centers, Wasserstein balls in Rd\mathbb R^d correspond to explicit ellipsoidal sets in moment space due to the closed-form formula for W2W_2 between Gaussians (Nguyen et al., 2019).

Geometric and combinatorial complexity, such as the number of supporting faces or algebraic degree, can be analyzed directly in finite settings and has implications for algorithmic tractability (Çelik et al., 2020).

4. Algorithmic Aspects and Solution Methods

Optimization over Wasserstein balls admits several algorithmic strategies, shaped by the convexity and structure of the ambiguity set and by the function class.

  • For 1-Wasserstein balls and convex piecewise-linear or Lipschitz costs, interior-point and cutting plane methods exploit strong duality and the finite-support structure of worst-case distributions (Byeon et al., 2022, Chen et al., 2018).
  • For 2-Wasserstein balls, copositive or semidefinite program hierarchies provide systematic tractable inner-approximations. Exactness may be recovered under complete-recourse assumptions; in general, the dual variable λ\lambda acts as a transport-penalty parameter (Hanasusanto et al., 2016, Byeon, 9 Jan 2025).
  • Minimum cross-entropy projections onto Wasserstein balls are globally solvable via quasi-concave duals and cutting-plane methods in low dimensions (Vargas et al., 2021).
  • For settings with product structure (i.i.d. components), nonconvex "structured" Wasserstein sets admit increasingly sharp convex relaxations via symmetrization and lifting, with theoretical guarantees of convergence (Kharitenko et al., 30 Mar 2025).
  • In adversarial robustness and certification, Wasserstein balls in image space are transformed via affine flows into L1L_1 balls or polytopes, enabling adaptation of standard verification algorithms (Wegel et al., 2021).

Computational complexity may be linear, polynomial, or exponential in the number of samples depending on the problem structure, function class, and nature of the Wasserstein ball (1- vs 2-norm, support constraints, etc.).

5. Statistical Guarantees and Calibration

Wasserstein balls have strong theoretical support as ambiguity sets in data-driven settings.

  • Finite-sample coverage: Under light-tail and moment conditions, the true distribution lies in the Wasserstein ball around the empirical measure with high probability, with explicit rates on the radius (Lee et al., 2017, Jiang et al., 2019, Ibrahim et al., 4 Oct 2024).
  • Minimax guarantees: The minimax value over the Wasserstein ball coincides (up to statistical error) with the true worst-case risk, with sharp generalization and excess-risk bounds in terms of the complexity of the hypothesis class (e.g., covering numbers) (Lee et al., 2017).
  • In covariate-shift and domain adaptation, Wasserstein balls centered at distributional estimators (kernel, parametric, or combinations/intersections thereof) yield improved test-time guarantees and adaptivity to both covariate and label shifts (Wang et al., 4 Jun 2024, Selvi et al., 18 Jul 2024).

Choosing the radius is crucial: explicit concentration rates or empirical cross-validation are commonly used.

6. Applications and Model Structures

Wasserstein-ball ambiguity sets appear in a variety of distributionally robust formulations:

  • Two-stage stochastic/conic programs: Exact reformulations or copositive relaxations over Wasserstein balls balance tractability and model fidelity. Notably, 2-Wasserstein balls allow smooth robustification, while 1-Wasserstein balls yield sample-average plus linear penalty and may exhibit pathological all-or-nothing behavior (Hanasusanto et al., 2016, Byeon et al., 2022, Byeon, 9 Jan 2025).
  • Chance-constrained and mixed-integer programs: Ambiguous chance constraints over Wasserstein balls admit exact or inner convex approximations (CVaR-based, Bonferroni, expected-violation), with tightness and computational implications distinctly depending on the approximation technique and ball structure (Chen et al., 2018, Chen et al., 2022).
  • Federated and decentralized DRO: Mixture-of-Wasserstein-balls ambiguity sets allow for decentralized optimization and separability, with high-probability coverage and weighted composition of local distributions (Ibrahim et al., 4 Oct 2024).
  • Minimax learning and domain adaptation: Wasserstein balls provide principled ambiguity sets for robust risk minimization, generalization under distribution shift, and out-of-distribution adaptation (Lee et al., 2017, Wang et al., 4 Jun 2024).
  • Adversarial robustness: Certification and attack methods for Wasserstein-bounded perturbations are now standard, leveraging the geometric properties of Wasserstein balls (Wegel et al., 2021, Selvi et al., 18 Jul 2024).

7. Core Distinctions: 1-Wasserstein vs. 2-Wasserstein Balls

A central structural distinction is documented for two-stage DRO and related settings:

Ball Type Computational Form Robustification Behavior Out-of-Sample Performance Recourse (Second Stage)
1-Wasserstein Linear (LP/MILP) SAA + linear penalty, non-smooth/adaptive May be pathological: SAA/RO jump Sensitive to support/extremal
2-Wasserstein Copositive (SDP) Smooth penalty, continuous adaptation Smoother, more stable, better OOS Robustification interpolates

For right-hand-side uncertainty, 1-Wasserstein balls may yield worst-case distributions by moving vanishing mass to infinity, leading to invariant first-stage solutions over a range of radii ("pathological behavior") (Byeon, 9 Jan 2025). In contrast, 2-Wasserstein balls produce smooth dependence on the ambiguity parameter, more interpretable robust shifts, and improved practical performance, at the cost of higher computational burden (Byeon, 9 Jan 2025, Hanasusanto et al., 2016, Byeon et al., 2022).

References

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Wasserstein Balls.