WFR Mean Flow Matching Framework

Updated 4 February 2026

WFR-MFM is a computational framework for dynamic unbalanced optimal transport that directly learns mean flows of velocity and mass growth over arbitrary intervals.
It employs neural network parameterizations to regress against analytic geodesic targets, significantly reducing inference steps compared to traditional ODE solvers.
Empirical benchmarks demonstrate up to 10³× faster inference and top-ranked accuracy on metrics like 1-Wasserstein and Relative Mass Error in applications such as single-cell genomics.

Wasserstein-Fisher-Rao Mean Flow Matching (WFR-MFM) is a computational and modeling framework for solving dynamic unbalanced optimal transport (UOT) problems, particularly under the Wasserstein-Fisher-Rao (WFR) geometry. WFR-MFM enables efficient inference of coupled mass transport and mass variation over time, addressing both computational bottlenecks and predictive accuracy in time-dependent density evolution for applications such as single-cell genomics. Its distinguishing feature is the direct learning of interval-averaged dynamics—mean flows of velocity and mass-growth—over arbitrary time intervals, replacing the need for computationally intensive trajectory simulation with direct, one-step transformation rules (Wang et al., 28 Jan 2026).

1. Mathematical Foundations

The WFR dynamic unbalanced OT problem is formalized by a continuity equation with growth:

$\partial_t \rho(t,x) + \nabla \cdot [\rho(t,x) u(t,x)] = g(t,x) \rho(t,x)$

Here, $\rho(t,x)$ is the time-dependent density (possibly non-mass-conserving), $u(t,x)$ is a velocity field, and $g(t,x)$ the instantaneous mass-growth (birth-death) rate. The WFR distance of order two between nonnegative measures $\mu_0$ , $\mu_1$ is:

$\mathrm{WFR}_\delta^2(\mu_0,\mu_1) = \inf_{\rho,u,g} \int_0^1 \int_x \tfrac12 \left( \|u(t,x)\|^2 + \delta^2 g(t,x)^2 \right) \rho(t,x)\, dx\, dt$

subject to the above unbalanced continuity constraint and endpoint conditions $\rho(0)=\mu_0$ , $\rho(1)=\mu_1$ . The “mass variation penalty” $\delta$ tunes the cost of mass change relative to transport (Wang et al., 28 Jan 2026, Peng et al., 11 Jan 2026).

The geodesic between two Diracs $m_0\delta_{x_0}\to m_1\delta_{x_1}$ admits a closed-form dynamic path describing simultaneous displacement and mass evolution, foundational for constructing conditional couplings and analytic training targets.

2. Mean Flow Matching Paradigm

WFR-MFM introduces a mean-flow matching framework by learning the average effect of transport and growth over arbitrary intervals $\left[t, T\right]$ :

Mean velocity:

$v(x, t, T) = \frac{1}{T-t} \int_t^T u_\tau(x_\tau) d\tau$

Mean growth:

$h(x, t, T) = \frac{1}{T-t} \int_t^T g_\tau(x_\tau) d\tau$

where $x_\tau$ evolves under $u_\tau$ with $x_t = x$ .

Consistency identities relate $(v,h)$ back to the instantaneous $(u,g)$ :

$\begin{align*} v(x, t, T) &= u_t(x) + (T-t)[\partial_t v + (\nabla_x v)\cdot u_t](x, t, T) \ h(x, t, T) &= g_t(x) + (T-t)[\partial_t h + (\nabla_x h)\cdot u_t](x, t, T) \end{align*}$

A key property is additivity: the aggregated mean flow over $[t,T]$ equals the sum of mean flows over subintervals concatenated along the evolved path. This enables both one-step and multi-step interval-based inference schemes without numerical ODE integration (Wang et al., 28 Jan 2026).

3. Learning Objectives and Parameterization

WFR-MFM parametrizes the mean velocity $v_\theta$ and mean growth $h_\phi$ as neural networks, trained to regress against analytic targets derived from closed-form geodesics between optimal pairs $(x_0, x_1)$ . The main conditional loss is:

$L_c(\theta, \phi) = \mathbb{E}_{t<T,\, z\sim q,\, x\sim \rho_t(\cdot|z)} \left[ \|v_\theta(x,t,T) - \text{sg}[v(x,t,T|z)]\|^2 + \lambda \|h_\phi(x,t,T) - \text{sg}[h(x,t,T|z)]\|^2 \right] m_t(z)$

Here, $\lambda$ balances accuracy between transport and mass components, $\text{sg}[\cdot]$ denotes stop-gradient, $m_t(z)$ is the evolving mass, and $z=(x_0,x_1)$ is sampled from an optimal coupling. No geometric regularizer beyond the WFR cost is required (Wang et al., 28 Jan 2026).

This conditional mean-flow loss is constructed using samples from the optimal WFR coupling, analytic instantaneous fields, and derived mean-flow consistency formulas, constituting a regression problem with closed-form supervision.

4. Inference Algorithm and Complexity

Inference under WFR-MFM proceeds via direct application of learned mean-flow nets:

One-step inference: Given $(x_0, m_0)$ at $t=0$ , output

$x_1 = x_0 + v_\theta(x_0, 0, 1), \quad m_1 = m_0 \exp[ h_\phi(x_0, 0, 1) ]$

Multi-step inference: For times $0=t_0<\ldots<t_K=1$ , recursively apply

$x_{t_{k+1}} = x_{t_k} + (t_{k+1}-t_k) v_\theta(x_{t_k}, t_k, t_{k+1}), \quad m_{t_{k+1}} = m_{t_k} \exp[ (t_{k+1}-t_k) h_\phi(x_{t_k}, t_k, t_{k+1}) ]$

This scheme requires a single (or a small number of) neural network evaluations per sample, in contrast to traditional ODE solvers (e.g., RK5, Euler), which require $O(K)$ forward passes and yield at least two orders of magnitude higher computational cost for similar accuracy (Wang et al., 28 Jan 2026).

5. Empirical Results and Benchmarks

Quantitative results demonstrate the advantages of WFR-MFM:

Inference speed: Achieves $10^2$ – $10^3\times$ faster inference compared to adaptive RK5 WFR-FM, and approximately $10^2\times$ faster than 100-step Euler (Wang et al., 28 Jan 2026).
Accuracy: Ranks top-2 in both 1-Wasserstein ( $W_1$ ) and Relative Mass Error (RME) across synthetic (Gene, Dyngen, Gaussian mixtures) and real (EMT, EB, CITE, Mouse) scRNA-seq datasets, closely matching multi-step WFR-FM (Wang et al., 28 Jan 2026).
Speed-accuracy trade-off: Increasing the number of inference steps from 1 to $K$ refines precision at linearly increasing cost, enabling controlled trade-off (Fig 2).
Scalability: Maintains low GPU memory usage ( $\leq$ 8 GiB) and sub-second inference on large ( $\geq$ 100D) datasets (Wang et al., 28 Jan 2026).
Generalization: Demonstrates efficient prediction of perturbation responses in high-throughput regimes, e.g., 5000 unseen perturbations over 10,000 cells in 6.6 s with WFR-MFM versus $\sim$ 2000 s for WFR-FM ( $W_1=0.115$ , RME = 0.069) (Wang et al., 28 Jan 2026).

6. Hyperparameters, Practical Strategies, and Limitations

Selection of the mass-variation penalty $\delta$ , cross-time sampling ratio $p_\mathrm{diff}$ , and growth loss weight $\lambda$ is critical:

$\delta$ determines the cost of mass change; typical values: 1–3 for low-dim data, $\sim$ 10–30 for high-dim gene data.
$p_\mathrm{diff}$ (fraction of off-diagonal interval samples during training): empirically $0.4$–$0.6$ offers stability.
$\lambda$ balances transport and mass accuracy (range $0.01$–$0.1$).
Underfitting on curved geodesics: Using $K>1$ inference steps remedies limitations of strict one-step inference.
Recommended pipeline: Estimate a preliminary OT coupling for sensible $\delta$ , set $p_\mathrm{diff}=0.5$ , $\lambda=0.05$ , and tune W $_1$ , RME on a held-out set for further adjustment (Wang et al., 28 Jan 2026).

A plausible implication is that WFR-MFM, by sidestepping explicit ODE simulation, can be adapted to large-scale and high-throughput settings—particularly in single-cell omics—where traditional time-integration-based UOT solvers are computationally prohibitive.

7. Relation to Prior and Concurrent Frameworks

WFR-MFM generalizes simulation-free flow matching (FM) concepts (Peng et al., 11 Jan 2026) to the unbalanced, mass-variant regime:

WFR-FM (Peng et al., 11 Jan 2026) parameterizes velocity and growth fields using neural networks and regresses against closed-form conditional geodesic targets derived from traveling Dirac solutions. WFR-MFM extends this by directly learning interval-averaged mean flows, supporting inference over arbitrary time intervals with additive composability.
Continuous Normalizing Flows for spherical WFR (Jing et al., 2022) employ Benamou-Brenier dynamic formulations with mass change, regularized by KL divergence and velocity penalties, to generate weighted samples under spherical WFR. These approaches require ODE simulation, whereas WFR-MFM achieves competitive accuracy with O(1) inference cost per sample.
Connections to static WFR theory: The Dirac–Dirac geodesic provides analytic construction of conditional couplings central to WFR-MFM and WFR-FM methodologies (Peng et al., 11 Jan 2026).

This synthesis of dynamic unbalanced OT with neural interval-mean flow regression constitutes the core advance of WFR-MFM for scalable modeling and inference in temporally structured, mass-evolving systems.