Dynamic Measure Transport (DMT)

Updated 9 November 2025

Dynamic Measure Transport (DMT) is a framework that models the evolution of probability measures via PDEs, unifying optimal transport, control theory, and mean-field games.
It employs both deterministic and stochastic dynamics with tilted reference paths to overcome teleportation issues, thereby enhancing sample quality.
The framework integrates kernel-based numerical methods and Gaussian processes to solve the underlying optimal control problems, with applications in generative modeling and Bayesian inference.

Dynamic Measure Transport (DMT) is a mathematical framework unifying dynamic formulations of measure transport, optimal control, and mean-field games. It generalizes classical optimal transport by modeling the evolution of probability measures subject to partial differential equations (PDEs) in both finite- and infinite-dimensional settings, with significant implications for sampling, generative modeling, and gradient flows. DMT encompasses both traditional “smooth” optimal transport and extensions to spaces characterized only by weak geometric or topological structure, including applications in probability, analysis, stochastic differential equations, and computational statistics.

1. Formal Definition and Mathematical Structures

Let $X = \mathbb{R}^d$ (or an extended metric-topological space), and fix two Borel probability measures $\eta$ (reference) and $\pi$ (target). DMT models the evolution $\{\mu_t\}_{t \in [0,1]}$ of probability measures such that $\mu_0 = \eta$ and $\mu_1 \approx \pi$ , by either deterministic or stochastic dynamics: $dX_t = v(X_t, t)\,dt + \sigma\,dW_t,\quad X_0 \sim \eta,$ where $v: \mathbb{R}^d \times [0,1] \to \mathbb{R}^d$ is a drift field, $\sigma \geq 0$ is a noise parameter, and $W_t$ is standard Brownian motion. The law $\mu_t := \mathrm{Law}(X_t)$ evolves according to:

The continuity equation (ODE case, $\sigma=0$ ):

$\partial_t \mu_t + \nabla \cdot (v_t\,\mu_t) = 0,$

The Fokker–Planck equation (SDE case, $\sigma>0$ ):

$\partial_t \mu_t + \nabla \cdot (v_t\,\mu_t) = \frac{\sigma^2}{2}\Delta \mu_t.$

In more abstract contexts, DMT is defined on an extended metric-topological measure space $(X,\mathcal{T},d,m)$ , where $d$ may be infinite, and $m$ is a Radon probability measure. The Cheeger energy $\mathcal{E}_C$ generalizes Dirichlet energy and induces a dynamic transport cost (see (Ambrosio et al., 2015)). The DMT/Wasserstein–Cheeger distance $W_{\mathcal{E}_C}$ between absolutely continuous measures is given by a Benamou–Brenier-type formula.

2. Reference Paths, Teleportation Phenomena, and Their Limitations

A canonical construction in dynamic measure transport is the geometric annealing (or “annealed”) reference path: $\mu^{\mathrm{ref}}(x, t) \propto \eta(x)^{1-t} \pi(x)^t,\quad t \in [0,1],$ with log-density

$\log \mu^{\mathrm{ref}}(x, t) = (1-t)\log \eta(x) + t\log \pi(x) - \log Z(t),$

where $Z(t)$ is the time-dependent normalization. This choice has mathematical convenience—analytic expressions for time-derivatives facilitate density-driven algorithmic approaches.

However, when $\eta$ and $\pi$ are multimodal or well-separated, $\mu^{\mathrm{ref}}$ exhibits “teleportation.” Most of the mass may abruptly move from one mode to another at a critical $t \approx t_0$ . As a result, transport velocities $v$ must become large or highly irregular, and density-driven learning (aligning a velocity field to $\mu^{\mathrm{ref}}$ ) often fails to “split” or move mass correctly. Empirically, this is reflected in significant sample quality deficits or mode-dropping in sampling applications, as evidenced in one-dimensional Gaussian mixture experiments (Section 7 below).

3. Optimal Control and Mean-Field Game Perspective

DMT can be naturally framed as an infinite-dimensional optimal control problem or mean-field game (MFG). For a curve of densities $\rho_t$ and a velocity field $v$ , the following variational problem encapsulates DMT with action, smoothness, and fidelity costs: $\begin{aligned} \min_{\rho,\,v} \quad & D_{\mathrm{KL}}(\rho(1)\|\pi) + \int_0^1 \left[(1-t) D_{\mathrm{KL}}(\rho(t)\|\eta) + t D_{\mathrm{KL}}(\rho(t)\|\pi)\right] dt \ &+ \int_0^1 \mathbb{E}_{x\sim\rho(t)}\left[\frac{1}{2}|v(x,t)|^2\right]dt \ \text{subject to} \quad &\partial_t \rho + \nabla \cdot (\rho v) = 0,\qquad \rho(0)=\eta \end{aligned}$ The KL-interaction $(1-t) D_{\mathrm{KL}}(\rho\|\eta) + t D_{\mathrm{KL}}(\rho\|\pi)$ has a unique minimizer given by the geometric annealing path. The formal optimality (first-order) system couples a forward continuity equation with a backward Hamilton–Jacobi–Bellman (HJB) equation, enforcing both fidelity to boundary data and smoothness/action minimization subject to fixed start/end measures.

This MFG/control-theoretic framing enables flexible introduction of path-dependent fidelity and regularization criteria beyond what is available in standard OT formulations.

4. Tilted-Path Formulation and Optimization

To overcome pathologies (e.g., “teleportation”) of standard reference paths, DMT introduces a “tilting function” $g(x,t)$ . The density path is reparametrized: $\log \rho^g(x, t) = \log \mu^{\mathrm{ref}}(x, t) + g(x, t) - \log Z(t),$ with $g(\cdot,0)=g(\cdot,1)=0$ and $\rho^g(0) = \eta$ , $\rho^g(1) = \pi$ . The optimal control problem is then

$\min_{v \in \mathcal{V},\, g \in \mathcal{G}} \|v\|_{\mathcal{V}}^2 + \lambda_g \|g\|_{\mathcal{G}}^2$

subject to the continuity equation $\partial_t \rho^g + \nabla \cdot (v \rho^g)=0$ and constraints on $g$ . The spaces $\mathcal{V}, \mathcal{G}$ typically employ Sobolev or reproducing kernel Hilbert space (RKHS) norms to enforce spatial/temporal smoothness; e.g.,

$\|v\|^2_{\mathcal{V}} = \int_0^1 \|v(\cdot,t)\|^2_{H^s_x} dt,\qquad \|g\|^2_{\mathcal{G}} = \int_0^1 \|g(\cdot,t)\|^2_{H^r_x} dt.$

Equivalently, an augmented Lagrangian can be introduced for deriving necessary optimality conditions in mixed PDE form, leading to a flexible Banach-space optimization framework.

5. Numerical Solution via Gaussian Processes and Collocation

For practical computation, the tilted DMT control problem is discretized via a Gaussian process (GP) and collocation approach:

Choose a collocation grid $\{(x_j, t_j)\}_{j=1}^J$ in $\mathbb{R}^d \times [0,1]$ and select boundary points for $t=0,1$ .
Model the scalar potential $u$ (so $v = \nabla u$ ) and tilt $g$ as elements of scalar-valued RKHSs $\mathcal{H}_u$ , $\mathcal{H}_g$ with product kernels $K_u,K_g$ (typically Matérn-type in space/time with lengthscales $\sigma_x$ , $\sigma_t$ ).
Enforce the nonlinear residual of the continuity or Fokker–Planck equation at each interior collocation point, i.e., $F_j(z_j;c)=0$ , where $z_j$ collects the needed derivatives and $c$ handles time normalization. At boundaries, impose $g=0$ .
By the representer theorem, the minimizers $u^*,g^*$ lie in finite spans of kernel sections evaluated or differentiated at grid points.

The empirical optimization reduces to a penalized least-squares problem: $\min_{z_u, z_g, c} \quad z_u^\top K_u(\phi, \phi)^{-1}z_u + \lambda_g z_g^\top K_g(\psi, \psi)^{-1}z_g + \lambda_{\mathrm{pde}} \sum_{j=1}^J |F_j(z_j; c)|^2 + \lambda_{\mathrm{bc}} \sum_{\text{boundary}} |g|^2,$ which is solved via trust-region methods such as Levenberg–Marquardt, using Cholesky parameterization of the Gram matrices.

6. Theoretical Results: Representer Theorem and Well-posedness

Representer theorem for DMT collocation: Given a Hilbert space $\mathcal{H}$ with kernel $K$ and a finite set of linear functionals $\{\ell_i\}_{i=1}^n$ , the RKHS minimizer constrained by $\ell_i(h) = d_i$ is: $h^* = \sum_{i=1}^n \alpha_i K(\cdot, \ell_i),\quad \text{where} \quad K(\ell_j, \ell_i) = \ell_j[K(\cdot, \ell_i)]$ and $\alpha$ solves $K(\ell, \ell) \alpha = d$ . This structure ensures all learned objects admit efficient parametrization in terms of kernel sections induced by collocation.

Existence of minimizers: Under mild assumptions (smooth, positive densities for the reference path; strictly positive regularizers $\lambda_g, \lambda_{\mathrm{pde}}, \lambda_{\mathrm{bc}}$ ) the finite-dimensional penalized least-squares problem is coercive and continuous, guaranteeing the existence of minimizers.

These results provide a rigorous foundation for kernel-based and collocation-based implementations and imply provable smoothness of the optimal transport velocity fields and tilts.

7. Empirical Assessment and Sampling Applications

In one-dimensional experiments with reference $\eta = \mathcal{N}(0, 1)$ and target $\pi = \frac{2}{3} \mathcal{N}(-8, 1) + \frac{1}{3} \mathcal{N}(4, 1)$ , the geometric annealing path $\mu^{\mathrm{ref}}$ fails to transport mass to the leftmost mode—reflecting the teleportation pathology. Learned velocity fields along this path produce samplers missing regions of the target.

The tilted DMT approach (using the Banach-space control framework and GP solver) avoids teleportation and yields smooth, balanced mass transfer into both modes, demonstrated by:

Fraction of trajectories capturing the left mode (true = 0.667): reference $\approx 0.005$ vs. tilt-learned $\approx 0.375$ .
Relative error in mean: $1.80$ (ref) vs. $0.88$ (learned).
Relative error in variance: $0.96$ (ref) vs. $0.016$ (learned).
Kernel MMD: $0.743$ (ref) vs. $0.137$ (learned).
The spatial RKHS norm of the learned velocity remains stable under the tilted path, indicating superior regularity.

Trajectory visualizations corroborate the improved spatial smoothness and sampling fidelity achieved by the tilted DMT method relative to analytic McCann velocity or standard reference paths.

Key implementation and application principles include:

Regularization: $\lambda_g$ balances the scale of the tilt; $\lambda_{\mathrm{pde}}$ and $\lambda_{\mathrm{bc}}$ enforce PDE and boundary condition fidelity.
Kernel hyperparameters: $\sigma_x$ and $\sigma_t$ govern the spatial/temporal smoothness of the velocity and tilt; higher values yield smoother but less flexible paths.
Collocation resolution $J$ drives tradeoffs between accuracy and computational requirements, with worst-case cubic scaling in kernel matrix assembly and linear solver steps. Inducing point or hierarchical strategies can reduce computational cost.
DMT is broadly applicable: generative modeling (continuous normalizing flows, diffusion models), density-driven/annealing samplers, Bayesian inference, obstacle-aware robotic transport (via tilt $g$ ), and finetuning of pretrained generative models.

In the context of non-smooth, infinite-dimensional, or “Wiener-like” spaces, DMT generalizes classical optimal transport and the Otto calculus, leveraging the Cheeger energy and Benamou–Brenier-type dynamic characterizations. Heat semigroups generated by Dirichlet forms—analyzed via the evolution variational inequality (EVI)—admit contractivity and curvature results extending well beyond the $L^2$ -Wasserstein theory (Ambrosio et al., 2015). A plausible implication is that DMT provides a unifying mathematical and computational infrastructure for measure-valued dynamics in both classical and highly singular regimes.

PDF Markdown Chat (Pro)

References (1)

Optimal transport, Cheeger energies and contractivity of dynamic transport distances in extended spaces (2015)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Dynamic Measure Transport (DMT).