Optimal Flow Matching (OFM)

Updated 24 November 2025

OFM is a framework for generative modeling that transports a source distribution to a target distribution along straight, optimal-transport trajectories.
It employs convex-potential parameterization via ICNNs and minimizes a specialized loss to recover the unique Brenier map with minimal kinetic action.
Empirical evaluations demonstrate that OFM achieves high-quality generation in few steps, offering competitive performance with lower inference cost despite computational challenges in high dimensions.

Optimal Flow Matching (OFM) is a theoretical and algorithmic framework for generative modeling that constructs deterministic flows transporting a source distribution to a target distribution along straight, optimal-transport–displacement trajectories. OFM is characterized by its deep integration with optimal transport (OT) theory, leading to provable guarantees on the linearity and minimal action of the learned paths, equivalence to the Kantorovich dual, and computational efficiency for fast, high-quality generation. The following sections distill the central concepts, mathematical formulation, algorithmic approaches, and empirical properties of OFM, referencing recent foundational works in the area.

1. Mathematical Formulation and Core Principles

Let $\mu = p_0$ denote the source (e.g., Gaussian noise) and $\nu = p_1$ the target (data) distribution, both on $\mathbb{R}^d$ . The central OFM paradigm is as follows:

Coupling: Define a coupling (joint law) $\pi \in \Pi(\mu, \nu)$ such that $\pi(A \times \mathbb{R}^d) = \mu(A)$ and $\pi(\mathbb{R}^d \times B) = \nu(B)$ for all measurable $A,B$ .
Linear Interpolation: For coupled pairs $(x_0, x_1)$ , form interpolations $x_t = (1-t)x_0 + t x_1$ with $t \in [0,1]$ .
Constant Velocity Reference: The reference velocity for flow matching is $u_t(x_t \mid x_0, x_1) = x_1 - x_0$ , leading to linear (straight-line) trajectories.
Optimal Transport Constraint: Choose $\pi$ as the minimizer of the quadratic cost OT problem:

$W_2^2(\mu, \nu) = \inf_{\pi \in \Pi(\mu, \nu)} \int \|x_0 - x_1\|_2^2 d\pi(x_0, x_1)$

By Brenier's theorem, the optimal map is $T^* = \nabla \Psi^*$ for a convex $\Psi^*$ , with $T^*_\# \mu = \nu$ .

OFM Loss Function: Restrict flow matching to vector fields $u_{t,\Psi}(x_t) = \nabla\Psi(z_0) - z_0$ along inverted path $z_0 = \phi_{t,\Psi}^{-1}(x_t)$ :

$\mathcal{L}_{\operatorname{OFM}}(\Psi) = \int_0^1 \mathbb{E}_{(x_0,x_1)\sim\pi} \left\| u_{t,\Psi}(x_t) - (x_1 - x_0) \right\|^2 dt$

This loss minimizes exactly when $\Psi$ induces the OT map.

Equivalence to OT Dual: For quadratic cost, OFM loss satisfies

$\mathcal{L}_{\operatorname{OFM}}(\Psi) = 2 \mathcal{L}_{OT}(\Psi) + \mathrm{const}$

where $\mathcal{L}_{OT}(\Psi) = \int \Psi(x_0)d\mu(x_0) + \int \Psi^*(x_1)d\nu(x_1)$ , directly connecting OFM minimization to optimal transport (Kornilov et al., 19 Mar 2024, Kornilov et al., 31 Oct 2025).

2. Theoretical Guarantees and Equivalence Results

The dynamical formulation of OT by Benamou–Brenier equates the OT cost to the minimum kinetic action over probability flows:

$\min_{(\rho_t,v_t)} \int_0^1 \int \|v_t(x)\|^2 \rho_t(dx) dt \quad \text{subject to} \quad \partial_t \rho_t + \nabla\cdot(\rho_t v_t) = 0$

OFM restricts the admissible vector fields to those induced by convex potentials, aligning the flow-matching model with the optimal displacement interpolation between $\mu$ and $\nu$ (Kornilov et al., 31 Oct 2025).

Crucially, minimizing the OFM loss over convex potentials recovers the Brenier potential, guaranteeing that sampled trajectories are exactly straight in time, non-intersecting, and globally optimal with respect to the Wasserstein-2 metric. Moreover, for any reference coupling $\pi$ , the minimizer $\Psi$ is unique up to additive constants, and the empirical trajectories coincide with the Monge OT map (Kornilov et al., 19 Mar 2024).

3. Algorithmic Implementation and Optimization Procedure

OFM implementation relies on modeling the potential $\Psi$ as an input-convex neural network (ICNN) to ensure convexity, and optimizing the OFM loss via stochastic gradient descent. At each iteration:

Batch Sampling: Draw mini-batches $\{x_{0,i}\}$ from $\mu$ and $\{x_{1,i}\}$ from $\nu$ .
Time Sampling: Sample random $t \in [0,1]$ .
Interpolation: Compute $x_{t,i} = (1-t)x_{0,i} + t x_{1,i}$ .
Inverse Map: For each $x_{t,i}$ , solve for $z_{0,i}$ in

$\min_z \left[ \frac{1-t}{2} \|z\|^2 + t \Psi(z) - \langle x_{t,i}, z \rangle \right]$

Loss Evaluation: Compute per-sample losses and aggregate.
Parameter Update: Update $\theta$ (parameters of $\Psi$ ) using Adam or another first-order optimizer.

Sampling at inference time is extremely efficient: a single evaluation of the trained map $x \mapsto \nabla\Psi_\theta(x)$ suffices to generate new samples (Kornilov et al., 19 Mar 2024, Kornilov et al., 31 Oct 2025).

4. Empirical Behavior and Comparisons

Empirical studies have established that OFM (using convex-potential parameterization) recovers the global OT map without error accumulation or the need for multi-stage iterative refinements. On standard 2D OT tasks such as Gaussian-to-swiss-roll or checkerboard distributions, OFM matches the ground-truth OT map, achieves minimal path energy, and yields non-intersecting, linear trajectories. Table-based evaluations on image synthesis tasks demonstrate that while mini-batch OT-based FM (BatchOT) offers modest improvements over random coupling, OFM achieves consistently better sample quality and path straightness with fewer integration steps (see Table below, data from (Lin et al., 29 May 2025); lower FID is better):

Method	1-Step FID	4-Step FID	128-Step FID
Flow Matching	324.04	36.85	11.05
BatchOT (OT-FM)	314.93	36.64	11.27
MAC (model-OT)	35.47	19.14	10.44

OFM’s strength lies in one-step (few-step) regimes, outperforming random or vanilla FM and displaying competitive or improved performance versus diffusion-based models at orders-of-magnitude lower inference cost.

5. Extensions: Unbalanced OT, Consistency Models, and Optimal Control

OFM, while originally posed for balanced quadratic cost, underpins several generalizations:

Unbalanced OT: OTFM frameworks for pansharpening and other conditional tasks embed dual unbalanced OT regularization into FM’s training loop, relaxing marginal constraints and allowing robust mapping under class or modality mismatch (Cao et al., 19 Mar 2025).
Consistency and Flow Map Matching: Flow Map Matching (FMM) generalizes OFM by directly matching two-time flow maps via Lagrangian or Eulerian distillation losses, unifying few-step consistency models and progressive distillation (Boffi et al., 11 Jun 2024).
Optimal Acceleration Transport: OAT-FM replaces mere velocity straightness with a second-order condition, minimizing integrated path acceleration for stricter straightness guarantees and improved generative quality, particularly as a fine-tuning phase (Yue et al., 29 Sep 2025).
Optimal Control for Guidance: OFM interpretation within the optimal control paradigm enables controlled generation, guided via terminal or running costs; this yields consistent improvements in applications such as text-guided image synthesis and multi-subject fidelity (Wang et al., 23 Oct 2024, Bill et al., 2 Oct 2025).

6. Limitations and Open Challenges

Despite its theoretical guarantees, OFM exhibits several practical constraints:

Computational Cost: In baseline OT-based FM (BatchOT), solving the OT coupling per batch incurs $\mathcal{O}(B^2)$ – $\mathcal{O}(B^3)$ cost, mitigated in OFM by the use of analytic or amortized convex potentials but still a concern for high dimensions (Lin et al., 29 May 2025).
Model Mismatch and Real-World Distributions: OFM’s theoretical optimality assumes convexity, absolute continuity, and sufficient model expressiveness. In high-dimensional settings or for distributions with complex supports, strictly linear (straight) trajectories may not always connect source and target supports without pathological artifacts; extensions using unbalanced OT or acceleration transport aim to relax these assumptions (Cao et al., 19 Mar 2025, Yue et al., 29 Sep 2025).
Empirical Limitation of Pure OT Matching: Geometry-only couplings can result in conflicting velocity targets at overlapping regions in the data space, causing averaging and loss of straightness in the learned flows (Lin et al., 29 May 2025).

7. Broader Impact and Theoretical Foundations

OFM is directly connected to classical results in OT, including the Benamou–Brenier dynamical formulation, the uniqueness of the Brenier potential, and the convergence rates in Wasserstein metrics. Recent analysis confirms that flow matching under suitable parameterizations achieves minimax-optimal convergence rates in $W_p$ distance ( $1 \leq p \leq 2$ ), matching the rates of stochastic score-based diffusion but via deterministic, simulation-free ODEs (Fukumizu et al., 31 May 2024). The direct connection between OFM and action matching (AM) for entire curves of distributions further elevates OFM as a unifying lens on dynamic generative modeling (Kornilov et al., 31 Oct 2025). This centrality underlines OFM’s broad relevance for fast, theoretically grounded generation in high-dimensional spaces.