Wasserstein Metric Regularization

Updated 5 September 2025

Wasserstein metric regularization is a framework that integrates optimal transport distances as regularizers to promote stability, generalized convexity, and convergence in variational problems.
It employs Moreau–Yosida regularization adapted to Wasserstein space, enabling precise discrete approximations of gradient flows, even when traditional contraction fails.
The approach bridges Hilbert space methods with nonlinear optimal transport, facilitating rigorous convergence analysis, entropy dissipation, and advanced numerical schemes for PDEs.

Wasserstein metric regularization is a class of methodologies that employs the Wasserstein (optimal transport) distance as either a primary objective or a regularizing term within optimization, statistical, and learning frameworks. This approach leverages the geometry of probability measures endowed with the Wasserstein metric, often as a mechanism for enforcing stability, convexity, or invariance properties that are not easily expressed with classical norms or divergences. Regularization in the Wasserstein space fundamentally departs from linear settings due to the nonlinear (and often geodesic) structure of the spaces involved, enabling crucial advances in discrete gradient flow, variational analysis, entropy dissipation, PDE theory, and functional inequalities.

1. Moreau–Yosida Regularization in the 2-Wasserstein Metric

The Moreau–Yosida regularization of a functional $E$ over a subset of probability measures in the 2-Wasserstein space, denoted $(\mathcal{P}_2, W_2)$ , is defined as

$E_\tau(\mu) = \inf_\nu \left\{ \frac{1}{2\tau} W_2^2(\mu, \nu) + E(\nu) \right\}.$

Here $\tau > 0$ represents a time-step parameter, and the infimum is taken over those $\nu$ of finite $W_2$ distance from a reference measure. The regularization mimics the Hilbert space construction but is adapted to the nonlinear geometry of the Wasserstein space: in Hilbert spaces, Moreau–Yosida regularization smooths functionals and strengthens convexity; in the $2$-Wasserstein metric, full convexity is generally unachievable along all geodesics, but a generalized convexity can be preserved.

$E_\tau$ serves as the foundation for constructing time-discrete approximations of gradient flows and proving functional inequalities. In the Wasserstein context, it provides a rigorous apparatus for approximating nonsmooth functionals and controlling their evolution with respect to the underlying optimal transport metric.

2. Proximal Map, Stepwise Contraction, and the $\Lambda_\tau$ Functional

The associated proximal map is given by

$J_\tau(\mu) = \operatorname{argmin}_\nu \left\{ \frac{1}{2\tau}W_2^2(\mu, \nu) + E(\nu) \right\}.$

Unlike the Hilbert case, the mapping $J_\tau$ is not contractive in $W_2$ due to the failure of uniform convexity of $W_2^2$ along all geodesics in dimension $d \geq 2$ . To restore a contraction property, the paper introduces a modified distance functional

$\Lambda_\tau(\mu, \nu) = W_2^2(\mu, \nu) + \frac{\tau^2}{2}[|\nabla_{(W)}E(\mu)|^2 + |\nabla_{(W)}E(\nu)|^2],$

where $|\nabla_{(W)}E|$ denotes the metric slope (modulus of the Wasserstein gradient). Under $\lambda$ -convexity of $E$ along generalized geodesics, the discrete gradient flow defined by repeated application of $J_\tau$ satisfies the contraction inequality

$\Lambda_\tau(J_\tau(\mu), J_\tau(\nu)) \leq \Lambda_\tau(\mu, \nu).$

This inequality provides quantitative control over the evolution of discrete solutions, even in scenarios where contraction in $W_2$ fails, facilitating convergence analysis and stability under iterations.

3. "Above the Tangent Line" Inequality, Talagrand, and HWI Inequalities

A crucial technical ingredient is the "above the tangent line" inequality: $E_\tau(\mu_\alpha) \leq (1-\alpha) E_\tau(\bar{\mu}) + \alpha E_\tau(\mu) - \alpha(1-\alpha)\frac{\lambda_\tau}{2}W_2^2(\bar{\mu}, \mu),$ where $\mu_\alpha$ is along a generalized geodesic, $\bar{\mu}$ is the minimizer, and the regularized convexity constant is $\lambda_\tau = \lambda/(1 + \lambda\tau)$ . This inequality refines classical convexity via the effective parameter $\lambda_\tau$ and is not restricted to geodesics but holds along generalized geodesics adapted to the regularity of $E$ .

Direct corollaries include:

Talagrand inequality: $E_\tau(\mu) - E_\tau(\bar{\mu}) \geq (\lambda_\tau/2) W_2^2(\mu, \bar{\mu})$ .
HWI inequality: $E_\tau(\mu) - E_\tau(\bar{\mu}) \leq |\nabla_{(W)}E_\tau(\mu)| W_2(\mu, \bar{\mu}) - (\lambda_\tau/2) W_2^2(\mu, \bar{\mu})$ .

These results are essential for deriving rates of contraction, convergence to equilibrium, and for quantifying the dissipation of entropy or generalized energy along discrete approximations to the gradient flow.

4. Applications to Gradient Flows of Rényi Entropies and Nonlinear PDEs

For $E$ corresponding to (signed) Rényi entropies, e.g.

$E_p(\mu) = \int U_p(f(x))dx \quad \text{with} \quad U_p(s) = \frac{s^p - s}{p-1}$

(where $f$ is the density of $\mu$ ), the continuous $W_2$ gradient flow corresponds to nonlinear diffusion equations:

$p > 1$ yields the porous medium equation.
$p < 1$ gives the fast diffusion equation.

The theory demonstrates that key features such as preservation and convergence to Barenblatt self-similar solutions are present at the level of the discrete-in-time scheme induced by the repeated proximal map, not merely in the vanishing time-step limit. The contraction in $\Lambda_\tau$ allows derivation of explicit, sharp polynomial convergence rates to equilibrium (after rescaling), and the scheme is robust to degeneracies in the entropy functional. This directly connects the abstract metric regularization to concrete analysis of PDEs.

5. Generalized Convexity of Moreau–Yosida in $(\mathcal{P}_2, W_2)$

Convexity in the Wasserstein space must be understood along so-called generalized geodesics, a broader class of curves than the classical ones. If $E$ is $\lambda$ -convex along these, then $E_\tau$ inherits a precise generalized convexity:

Not every geodesic preserves convexity, but the "above the tangent line" property holds with the effective $\lambda_\tau$ .
This generalized convexity is sufficient for uniform rates and stability analysis, and ensures that the discrete gradient flows preserve the regularization advantages familiar from the Hilbert space Moreau–Yosida regularization, adapted to the nonlinear setting.

This is vital for approximation theory and for the design of numerical schemes for diffusion- or transport-driven flows where standard convexity properties break down.

6. Broader Implications and Future Directions

The developed regularization theory establishes a robust link between discrete (proximal-scheme) and continuous gradient flows in the Wasserstein metric, supporting rigorous analysis of approximation, convergence, and rates. Key contributions include:

Transfer of Hilbert-space optimization methodologies (e.g., variational inequalities, proximal maps) to nonlinear, infinite-dimensional transport spaces.
Explicit translation of discrete invariances and symmetry properties (such as translation invariance for Rényi entropies) to the regularized setting.
Identification of appropriate functionals and geometric constructs for controlling stability and error in both theory and numerical practice.

Future research avenues include:

Extending these regularization approaches to broader functional classes, including those not directly tied to entropy.
Investigating numerical discretization and implementation of generalized-geodesic-based schemes in high dimensions.
Further exploring the interplay between invariance properties and the variational structure induced by optimal transport regularization.
Generalizing the framework to other metric measure spaces beyond $(\mathcal{P}_2, W_2)$ , particularly where geodesic structure and curvature are nontrivial.

The general principle is that, by modifying classical regularization techniques to respect the nonlinear geometry of Wasserstein space, one achieves discrete approximation and analytic tools for studying and simulating gradient flows in settings governed by optimal transport. This not only advances the mathematical analysis of nonlinear diffusions and related PDEs, but also lays a foundation for further applications in approximation theory and numerical analysis within the optimal transport paradigm.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Wasserstein Metric Regularization.

Wasserstein Metric Regularization

1. Moreau–Yosida Regularization in the 2-Wasserstein Metric

2. Proximal Map, Stepwise Contraction, and the $\Lambda_\tau$ Functional

3. "Above the Tangent Line" Inequality, Talagrand, and HWI Inequalities

4. Applications to Gradient Flows of Rényi Entropies and Nonlinear PDEs

5. Generalized Convexity of Moreau–Yosida in $(\mathcal{P}_2, W_2)$

6. Broader Implications and Future Directions

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Wasserstein Metric Regularization

1. Moreau–Yosida Regularization in the 2-Wasserstein Metric

2. Proximal Map, Stepwise Contraction, and the Λτ\Lambda_\tauΛτ​ Functional

3. "Above the Tangent Line" Inequality, Talagrand, and HWI Inequalities

4. Applications to Gradient Flows of Rényi Entropies and Nonlinear PDEs

5. Generalized Convexity of Moreau–Yosida in (P2,W2)(\mathcal{P}_2, W_2)(P2​,W2​)

6. Broader Implications and Future Directions

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

2. Proximal Map, Stepwise Contraction, and the $\Lambda_\tau$ Functional

5. Generalized Convexity of Moreau–Yosida in $(\mathcal{P}_2, W_2)$