Weak Optimal Transport Overview

Updated 27 November 2025

Weak optimal transport is a generalized framework where cost functions depend nonlinearly on conditional probability distributions.
It establishes strong duality with well-defined primal and dual formulations and ensures optimal plan stability through cyclical monotonicity.
Applications span economics, finance, and data science, with computational methods including mirror descent and neural approximations.

Weak optimal transport (WOT) is a generalization of the classical Monge–Kantorovich optimal transport framework, where the transport cost between a source point and the target can depend nonlinearly or even nonlocally on the conditional law of the coupling. This broad variational framework, introduced by Gozlan, Roberto, Samson, Tetali, and further developed by numerous others, unifies and extends classical OT, barycentric transport, martingale and entropic optimal transport, and provides new tools and perspectives for analysis, computation, economics, and probability.

1. Formal Framework and Problem Statement

Let $X, Y$ be Polish spaces, $\mu \in \mathcal{P}(X)$ , $\nu \in \mathcal{P}_p(Y)$ with $p \geq 1$ , and $\Pi(\mu, \nu)$ the set of couplings with marginals $\mu, \nu$ . Each coupling $\pi \in \Pi(\mu,\nu)$ admits a disintegration $\pi(dx,dy) = \mu(dx)\,\pi_x(dy)$ .

The weak optimal transport problem is defined for a measurable cost function $C : X \times \mathcal{P}_p(Y) \to [0,\infty]$ , convex and lower semicontinuous (l.s.c.) in the second argument (for the weak or Wasserstein topology). The primal problem is:

$\mathrm{WOT}_C(\mu, \nu) = \inf_{\pi \in \Pi(\mu,\nu)} \int_X C\big(x, \pi_x\big) \,\mu(dx).$

Classical OT corresponds to $C(x,\rho) = \int_Y c(x,y)\,\rho(dy)$ .

Duality

The dual problem involves pairs $(f,g) \in L^1(\mu)\times L^1(\nu)$ called admissible if for all $x$ and $\rho$ , $g \in L^1(\rho)$ :

$f(x) + \rho(g) \leq C(x, \rho).$

The dual value is:

$D_C(\mu, \nu) = \sup\{ \mu(f) + \nu(g) : (f,g)\ \text{admissible} \}.$

For fixed $g$ , one defines

$g^C(x) := \inf_{\rho \in \mathcal{P}_p(Y),\,g \in L^1(\rho)} \{ C(x,\rho) - \rho(g) \},$

so that

$D_C(\mu, \nu) = \sup_{g \in L^1(\nu)} \{ \mu(g^C) + \nu(g) \}.$

Key Assumptions

Lower boundedness: There exist $a_\ell \in L^1(\mu)$ , $b_\ell \in L^1(\nu)$ such that $C(x,\rho) \geq a_\ell(x) + \rho(b_\ell)$ .
Growth: There exist measurable $a, b$ and a convex, increasing, super-coercive function $h$ such that $C(x,\rho) \leq a(x) + \rho(b) + \int h(d\rho/d\nu) d\nu$ .
Truncation continuity: If $Y_k \uparrow Y$ , $C(x,\rho) \geq \limsup_{k \to \infty} C\big(x, \rho|_{Y_k}/\rho(Y_k)\big)$ .

Fundamental Theorem

Under these conditions:

Primal attainment: the infimum is attained.
Strong duality: $\mathrm{WOT}_C(\mu, \nu) = D_C(\mu, \nu)$ .
Under suitable growth/truncation continuity, dual attainment also holds.
Complementary slackness: for a primal optimizer $\pi$ and dual optimizer $(f,g)$ , $C(x, \pi_x) = f(x) + \pi_x(g)$ $\mu$ -almost surely (Beiglböck et al., 27 Jan 2025).

2. Principal Examples and Recoveries

Barycentric and Convex Costs

If $C(x,\rho) = h(x - m_\rho)$ with $h$ convex and $m_\rho = \int y\,\rho(dy)$ , e.g., $h(k)=|k|^2$ (barycentric quadratic cost):

$\mathrm{WOT}_C(\mu,\nu) = \inf_{\eta \leq_c \nu} W_2^2(\mu,\eta)$

where $\leq_c$ is the convex order.

The dual becomes: $\max\{ \mu(\phi) - \nu(\phi) : \phi \text{ convex, 1-Lipschitz} \}.$

Complementary slackness yields a barycentric map $T(x) = m_{\pi_x}$ characterized by subgradient conditions (Beiglböck et al., 27 Jan 2025, Cazelles et al., 2021, Guo et al., 26 Nov 2025).

Entropic and Martingale OT

For

$C(x,\rho) = \int c(x,y)\,\rho(dy) + H(\rho \mid \nu),$

the weak-OT becomes entropic OT.

If additionally $C(x, \rho) = +\infty$ unless $\int y\,\rho(dy) = x$ , one imposes a martingale condition, leading to weak martingale optimal transport (WMOT). Duality and structural results extend to the entropic-martingale context (Beiglböck et al., 27 Jan 2025, Chung et al., 2021, Carlier et al., 20 Nov 2025).

Hybrid Problems

Mixing barycentric and martingale/entropic constraints or costs yields continuous families interpolating between classical OT, martingale OT, and entropic OT, all captured within the WOT framework (Beiglböck et al., 27 Jan 2025, Guo et al., 26 Nov 2025).

3. Cyclical Monotonicity and Structural Optimality

Optimal weak transport plans are characterized by a form of cyclical monotonicity. A coupling $\pi$ is $C$ -monotone if for finite families $(x_i, p_i)$ , with competitor measures $q_i$ , $\sum p_i = \sum q_i$ :

$\sum_i C(x_i, p_i) \leq \sum_i C(x_i, q_i).$

Necessity and sufficiency of this condition (under extra regularity such as $C$ being Lipschitz in the measure variable) provide a direct generalization of classical cyclical monotonicity (Veraguas et al., 2018, Backhoff-Veraguas et al., 2019). This also underpins the stability theory: optimal plans are stable under perturbations of marginals or cost, given the adapted topology (which metrizes joint weak convergence of marginals and conditional laws) (Backhoff-Veraguas et al., 2019, Beiglböck et al., 2021).

4. Dynamic, Martingale, and PDE Connections

WOT admits a dynamic (PDE) characterization generalizing the Benamou–Brenier formula:

The static weak transport problem is equivalent to a dynamic minimization over curves $(\varrho_t, \lambda_t)$ solving a generalized Fokker–Planck equation with (possibly measure-valued) diffusion tensor and a convex cost–integration (Bulanyi, 2023).
Barycentric WOT can be described dynamically using drift–diffusion SDEs, with cost determined by the drift term, and further extended to martingale settings where the drift vanishes and the cost penalizes only covariance (Guo et al., 26 Nov 2025).

This establishes equivalence between static (coupling) and dynamic (PDE/SDE) perspectives for broad classes of convex costs.

5. Computational Methods and Algorithms

Efficient computation of WOT is challenging due to nonlinearity and complexity of the transport constraints.

Mirror descent methods: For barycentric and unnormalized-kernel variants (WOTUK), primal and dual variants of mirror descent with entropy mirrors (KL divergence) and Sinkhorn projection are provably convergent and scalable (Paty et al., 2022).
Neural approaches: Neural parameterizations of stochastic transport maps can approximate any WOT plan and can be optimized via a max–min (saddle-point) objective; this framework accommodates high-dimensional, nonlinear, and stochastic transport settings (Korotin et al., 2022).

These algorithms have been validated in economics (matching models), machine learning (distributional alignment, barycenters), and vision (image translation).

6. Weak Barycenters and Generalizations

Weak barycenters, defined via minimization of sums of WOT costs over a family of laws, generalize Wasserstein barycenters. Characterization and computation exploit the structure of convex ordering:

Existence: Tightness and lower semicontinuity arguments guarantee minimizers under moment conditions (Cazelles et al., 2021).
Characterization: Weak barycenters extract common geometric/latent information and have robustness advantages compared to classical barycenters.
Algorithms: Deterministic (fixed-point), stochastic (streaming), and optimization (proximal gradient) methods are available.

Open problems concern uniqueness (especially in higher dimensions), stability, and geometric properties.

7. Applications and Extensions

Economics: WOT captures nonlinear aggregation in matching models, labor assignment, and production economics, providing structural insights and richer matching patterns than OT (Paty et al., 2022).
Finance: WMOT models are fundamental for robust pricing under martingale constraints, with applications including the robust superhedging of options and VIX futures, and with proven stability under distributional uncertainty (Beiglböck et al., 2021).
Information Theory: Rate-distortion functions, Shannon bounds, and connections to the Schrödinger bridge are realized within the WOT setting (Zou et al., 16 Jan 2025).
Risk Measures: Convex risk measures with WOT penalties yield primal and dual representations, with computational schemes based on variational and neural optimization (Kupper et al., 2023).
Metric Geometry and Analysis: Extensions to barycentric costs, entropic regularizations, and transport with moment constraints expand the landscape of metric and probabilistic geometry (Carlier et al., 20 Nov 2025, Chung et al., 2021, Chung et al., 2019).

Table: Core Weak OT Paradigms

Cost Formulation	Characteristic Constraint	Classical Example
$C(x,\rho) = \int c(x,y)\,d\rho$	Linear (classical OT)	Wasserstein distance
$C(x,\rho) = h(x-\int y\,d\rho)$	Barycentric (convex order)	Brenier–Strassen
$C(x,\rho) = \cdots + H(\rho\|\,\nu)$	Entropic/Schrödinger regularization	Entropic OT
Martingale constraint ( $E[Y\|X]=X$ )	Mean-preserving (martingale OT)	Martingale couplings

This unified convex-analytic perspective recovers and extends key foundational results of optimal transport—duality, structure of optimizers, and characterization via potential functions or subgradients—and admits flexible hybridizations supporting applications in analysis, data science, and economics (Beiglböck et al., 27 Jan 2025, Guo et al., 26 Nov 2025, Choné et al., 2022).