Papers
Topics
Authors
Recent
2000 character limit reached

Kantorovich–Rubinstein Duality Overview

Updated 8 December 2025
  • Kantorovich–Rubinstein duality is a fundamental theorem in optimal transport that equates the minimal cost of transferring one probability measure to another with a supremum over 1-Lipschitz functions.
  • Its dual formulation leverages convex analysis techniques like Fenchel–Rockafellar duality and Strassen's theorem to secure the existence of optimal couplings under minimal regularity conditions.
  • Recent extensions to arbitrary cost functions, vector measures, and categorical frameworks underscore its applicability in functional inequalities, machine learning, and statistical analysis.

The Kantorovich–Rubinstein duality is a foundational result in optimal transport theory establishing an equality between the minimal cost for transferring one measure to another using a given metric and a supremal functional defined over a class of Lipschitz test functions. It characterizes the metric structure of probability measures via transport plans and function space duality, thereby linking convex analysis, probability, and functional analysis. The dual formulation not only provides theoretical insight but underlies many algorithmic and analytic advances in contemporary research.

1. Classical Formulation: Primal and Dual Problems

Let (Ω,d)(\Omega, d) be a bounded, locally compact Polish metric space, with μ\mu, ν\nu Borel probability measures on Ω\Omega. The set of admissible couplings is Γ(μ,ν)={π on Ω×Ω:π has marginals μ and ν}\Gamma(\mu, \nu) = \{\pi \text{ on } \Omega \times \Omega : \pi \text{ has marginals } \mu \text{ and } \nu\}, and the set of $1$-Lipschitz functions is Lip1(Ω)={gC(Ω):g(x)g(y)d(x,y)}\mathrm{Lip}_1(\Omega) = \{g \in C(\Omega) : |g(x) - g(y)| \leq d(x, y)\}.

Primal (Wasserstein-1):

W1(μ,ν)=infπΓ(μ,ν)Ω×Ωd(x,y)dπ(x,y)W_1(\mu, \nu) = \inf_{\pi \in \Gamma(\mu, \nu)} \int_{\Omega \times \Omega} d(x, y) \, d\pi(x, y)

Dual (KR-norm):

μνKR=supgLip1(Ω)Ωg(x)d(μν)(x)\|\mu - \nu\|_{\mathrm{KR}} = \sup_{g \in \mathrm{Lip}_1(\Omega)} \int_\Omega g(x) \, d(\mu - \nu)(x)

The Kantorovich–Rubinstein duality theorem asserts that W1(μ,ν)=μνKRW_1(\mu, \nu) = \|\mu - \nu\|_{\mathrm{KR}} with both extrema attained, with no additional regularity assumptions on μ\mu or ν\nu (Ciosmak, 2020, Gozlan et al., 2014, Bołbotowski et al., 30 Nov 2024, Bubenik et al., 8 Nov 2024, Rigo, 2019).

2. Proof Mechanisms: Convexity, Strassen's Theorem, and Choquet Theory

The duality is proved via several convex-analytic methods:

  • Fenchel–Rockafellar Duality: The transport cost functional is convex over couplings, and duality arises via Lagrange multipliers enforcing the marginal constraints. The supremum reduces to Lipschitz potentials subject to the f(x)f(y)d(x,y)|f(x)-f(y)| \leq d(x,y) constraint (Gozlan et al., 2014, Rigo, 2019, Gover, 23 Jan 2025).
  • Strassen’s Theorem: By formulating the dual in terms of convex cones of Lipschitz functions, Strassen’s theorem gives a Choquet integral representation: any (μ,ν)(\mu, \nu) pair can be written as integrated extreme points (pairs of Dirac masses), which correspond via Choquet theory to optimal couplings (Ciosmak, 2020).
  • Attainment and Tightness: Compactness, lower semicontinuity of the cost, and tightness of couplings guarantee existence of optimizers in both primal and dual (Gozlan et al., 2014, Bołbotowski et al., 30 Nov 2024, Rigo, 2019).

3. Generalizations: Cost Functions, Vector Measures, and Extensions

Arbitrary Transport Costs

For any lower-semicontinuous cost function c:X×Y[0,]c : X \times Y \to [0, \infty], the dual takes the form:

infπΓ(μ,ν)c(x,y)dπ(x,y)=supψ,φ{ψdμφdν    ψ(x)φ(y)c(x,y)  x,y}\inf_{\pi \in \Gamma(\mu, \nu)} \iint c(x, y)\,d\pi(x, y) = \sup_{\psi, \varphi}\Bigl\{ \int \psi\,d\mu - \int \varphi\,d\nu \;\big|\; \psi(x) - \varphi(y) \leq c(x, y)\;\forall x,y \Bigr\}

If c(x,y)c(x,y) is, for instance, convex or related to weak transport metrics (Talagrand–Marton), similar duality holds with constraints reflecting specific functional structures (Gozlan et al., 2014, Chung et al., 2019, Gover, 23 Jan 2025).

Vector Measure Extensions

In the case of vector measures, the duality generalizes as follows: Given measure vectors (μ1,,μn)(\mu_1, \dots, \mu_n), (ν1,,νn)(\nu_1, \dots, \nu_n), the primal seeks minimization over vector-valued plans, and the dual maximizes over multi-potentials (ψi,φi)(\psi_i, \varphi_i) subject to

i(ψi(x)+φi(y))ηi(x,y)c(x,y)\sum_i(\psi_i(x) + \varphi_i(y))\,\eta_i(x, y) \leq c(x, y)

where ηi\eta_i are densities. Existence of minimizers and dual maximizers is preserved in this setting by abstract convexity (Gover, 23 Jan 2025).

Relative KR Duality

The theory extends to "relative" settings, where mass may be created or destroyed at a designated reservoir AXA \subset X. Here, the cost function is replaced by dˉ(x,y)=min{d(x,y),d(x,A)+d(y,A)}\bar d(x, y) = \min\{d(x, y), d(x, A) + d(y, A)\}, and the space of functions is Lip0(X,A)\mathrm{Lip}_0(X, A)—functions vanishing on AA (Bubenik et al., 8 Nov 2024).

Families of Norms and Dual Spaces

Generalizations to pp-Kantorovich and qq-Lipschitz norms are developed for p,q[1,]p, q \in [1, \infty] (Hölder conjugates), yielding isometric dual spaces for measures and functions,

μKR,p=sup{fdμ:fLip(X),  fLip,q1}\|\mu\|_{KR,p} = \sup \Bigl\{ \int f\,d\mu : f \in \mathrm{Lip}(X),\;\|f\|_{Lip,q} \le 1 \Bigr\}

extending classical KR duality to broader Banach spaces (Terjék, 2021).

4. Analogues: Martingale and Second-Order Dualities

Martingale KR Duality

In the martingale transport problem, the primal minimizes over martingale couplings π\pi (i.e., E[YX=x]=xE[Y|X=x]=x), and the dual maximizes over pairs (f1,f2)(f_1, f_2) constrained by

f1(x)f2(y)+γ(x),yxc(x,y)f_1(x) - f_2(y) + \langle \gamma(x), y-x \rangle \leq c(x, y)

For costs satisfying the "martingale triangle inequality," the dual can be restricted further to single functions obeying barycentric inequalities, connecting to uniformly convex and uniformly smooth function characterizations (Ciosmak, 2020).

Second-Order Duality (Hessian-KR)

For C1,1(Rd)C^{1,1}(\mathbb R^d) functions uu with Hessian bounded by ±I\pm I, duality holds between

supu:uLip1{ud(νμ)}\sup_{u:\| \nabla u \|_{Lip} \leq 1} \Bigl\{ \int u\,d(\nu - \mu) \Bigr\}

and a three-point transport plan minimizing quadratic costs. The constraint uLip1\| \nabla u \|_{Lip} \leq 1 corresponds to a spectral bound on the Hessian, and the primal involves three-marginal couplings under convex order. This theory enables applications in optimal design (e.g., grillage mechanics) and second-order probabilistic metrics (Bołbotowski et al., 30 Nov 2024).

5. Abstract Duality and Descriptive Set Theory

Abstract duality results allow extension to measurable spaces, equivalence relations, and cost functions irregular from a topological standpoint (analytic sets, FσF_\sigma costs). Strong duality is proved for pairs (E,G)(E,G) (equivalence relation, sub-σ\sigma-algebra) when appropriate invariance conditions are met. Classical KR duality for total variation is recovered in these settings (Jaffe, 2022).

6. Generalized KR Duality: Functorial and Modal Logic Approaches

The duality can be cast categorically via functors (predicate liftings) defining the Kantorovich lifting (suprema over price functions) and Wasserstein lifting (infs over couplings). Generalized KR duality holds when the Wasserstein lifting can be represented by the same family of modalities as the Kantorovich lifting. For pp-Wasserstein with p>1p > 1, extra modalities are required; however, for the Lévy–Prokhorov distance and convex powerset metrics, a single modality suffices, enabling sharp algorithmic and logical characterizations (Wild et al., 27 Oct 2025).

7. Applications and Functional Inequalities

KR duality underpins a host of functional inequalities. A classical consequence is the estimate for integration error in quadrature: for Lipschitz ff and empirical measure ν\nu,

fdμfdνfLW1(μ,ν)\left| \int f \,d\mu - \int f \,d\nu \right| \leq \|\nabla f\|_{L^\infty} W_1(\mu, \nu)

Generalizations allow sharper bounds using Lorentz norms, e.g.,

fdx1Nf(xk)CfLd,1N1/dW(μ,ν)\left| \int f \,dx - \frac{1}{N} \sum f(x_k) \right| \leq C \|\nabla f\|_{L_{d,1}} N^{-1/d} W_\infty(\mu, \nu)

These inequalities connect the geometry of function spaces and optimal transport distances, with conjectured families interpolating between the extremal cases (Steinerberger, 2020).

Table: Primal and Dual Formulations in KR Duality

Setting Primal Dual
Classical Wasserstein-1 infπddπ\inf_\pi \int d\,d\pi supf:Lip(f)1fd(μν)\sup_{f: \mathrm{Lip}(f)\le1} \int f\,d(\mu-\nu)
General cost c(x,y)c(x,y) infπcdπ\inf_\pi \int c\,d\pi supf,g:f(x)+g(y)c(x,y)fdμ+gdν\sup_{f,g: f(x)+g(y)\le c(x,y)} \int f\,d\mu+\int g\,d\nu
Martingale transport infπ martingalecdπ\inf_{\pi \text{ martingale}} \int c\,d\pi supffd(μν)\sup_{f} \int f\,d(\mu-\nu)\quad (with barycentric inequality constraints)
Vector measures infπcdπ\inf_\pi \int c\,d\pi over vector plans sup(ψi,φi):i(ψi(x)+φi(y))ηi(x,y)c(x,y)iψidμi+φidνi\sup_{(\psi_i,\varphi_i) : \sum_i (\psi_i(x)+\varphi_i(y))\,\eta_i(x,y)\le c(x,y)} \sum_i \int \psi_i\,d\mu_i + \int \varphi_i\,d\nu_i
Relative duality infπdˉdπ\inf_\pi \int \bar d\,d\pi supfLip0(X,A)fd(μν)\sup_{f \in \mathrm{Lip}_0(X,A)} \int f\,d(\mu - \nu)

The structure, scope, and functional analysis underpinning Kantorovich–Rubinstein duality provide a unifying framework for the analysis of distances between probability measures, the construction of optimal transport plans, and the derivation of powerful inequalities in statistics, machine learning, and beyond. Recent generalizations and categorical approaches continue to widen its applicability and foundational significance.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Kantorovich–Rubinstein Duality.