Papers
Topics
Authors
Recent
Search
2000 character limit reached

Momentum Flow Matching (MFM)

Updated 8 February 2026
  • MFM is a generative modeling paradigm that injects time-dependent Gaussian noise into its velocity field, balancing deterministic behavior near data with stochastic exploration near noise.
  • The method discretizes straight-line trajectories into segments using a recursive velocity field formulation, enabling efficient sampling and enhanced multi-scale noise modeling.
  • Empirical evaluations on datasets like CIFAR-10 and CelebA-HQ demonstrate that MFM can reduce FID and improve recall by tuning decay rates and anchor points to control the trade-off between efficiency and diversity.

Momentum Flow Matching (MFM) is a generative modeling paradigm that introduces a stochastic, velocity-field-based approach to sampling between data and noise distributions, generalizing rectified flow (RF) models by discretizing straight-line trajectories into a sequence of variable "momentum fields." MFM interpolates between the efficiency of fast, straight-path RF sampling and the diversity and multi-scale noise modeling of diffusion models by injecting time-dependent Gaussian noise directly into the velocity field at each step, rather than into the state variable. This architecture aims to balance fast, efficient sampling near the data manifold with increasingly stochastic, exploratory dynamics as the trajectory approaches the noise manifold, thereby improving generation diversity without incurring the computational costs of traditional diffusion.

1. Motivation and Limitations of Rectified Flow

Rectified flow (RF) models define a deterministic ordinary differential equation (ODE) with constant velocity:

dxtdt=v=x1−x0\dfrac{d\bm{x}_t}{dt} = \bm{v} = \bm{x}_1 - \bm{x}_0

where x0∼π0\bm{x}_0 \sim \bm{\pi}_0 is a data sample and x1∼π1\bm{x}_1 \sim \bm{\pi}_1 is drawn from a noise distribution. This produces highly efficient, straight-line sampling trajectories requiring as little as one integration step. However, this fixed-velocity strategy restricts expressiveness:

  • Low diversity: All trajectories follow nearly the same path in data space, limiting sample variability.
  • Poor multi-scale noise modeling: There is no progressive corruption or denoising; the model is forced to ignore the granular evolution of noise scales over time.

While diffusion models address both issues via hundreds or thousands of stochastic steps, their computational overhead remains substantial. MFM seeks to blend the efficiency of RF with the richness of diffusion, traversing deterministic paths near π0\pi_0 and smoothly transitioning to highly stochastic exploration near π1\pi_1 (Ma et al., 10 Jun 2025).

2. Formulation and Learning Objective

MFM introduces a time-indexed velocity field vt\bm{v}_t at each sub-path, determined recursively as follows:

  • Initialization:

v0=β(ϵ0−x0),ϵ0∼N(0,I)\bm{v}_0 = \beta (\bm{\epsilon}_0 - \bm{x}_0), \quad \bm{\epsilon}_0 \sim \mathcal{N}(0, I)

  • Recursion for $0 < t < T$:

vt=γ vt−1+1−γ β ϵt,ϵt∼N(0,I)\bm{v}_t = \sqrt{\gamma} \, \bm{v}_{t-1} + \sqrt{1-\gamma} \, \beta \, \bm{\epsilon}_t, \quad \bm{\epsilon}_t \sim \mathcal{N}(0, I)

  • Terminal velocity:

vT=β ϵT\bm{v}_T = \beta \, \bm{\epsilon}_T

Here, TT is the number of anchor points, γ\gamma (0<γ<10 < \gamma < 1) is a decay factor, and

β=γ−1γT−1\beta = \frac{\sqrt{\gamma} - 1}{\sqrt{\gamma^T}-1}

As a state update, each anchor point in the path is given by:

zt=zt−1+vt−1,z0=x0\bm{z}_t = \bm{z}_{t-1} + \bm{v}_{t-1}, \quad \bm{z}_0 = \bm{x}_0

The velocity vt\bm{v}_t can also be expressed in closed form as:

vt=γt v0+1−γt βϵt\bm{v}_t = \sqrt{\gamma^t} \,\bm{v}_0 + \sqrt{1-\gamma^t} \, \beta \bm{\epsilon}_t

A neural network uθ(z,t)u_\theta(\bm{z}, t) is trained to predict vt\bm{v}_t at continuous time tt (scaled so t∈[0,1]t \in [0,1]). The learning objective ("momentum flow matching loss") is:

Lmfm(θ)=Et∼U[0,1]∥uθ(zt,t)−vt∥2L_{\mathrm{mfm}}(\theta) = \mathbb{E}_{t \sim U[0,1]} \left\| u_\theta(\bm{z}_t, t) - \bm{v}_t \right\|^2

3. Algorithmic Structure and Sampling Procedures

MFM divides the straight-line path from x0\bm{x}_0 to x1\bm{x}_1 into TT segments using anchor points z0,...,zT\bm{z}_0, ..., \bm{z}_T. Each segment is governed by a time-varying velocity, evolving from deterministic (t=0t=0) to fully stochastic (t=Tt=T). MFM departs from classical noise injection by corrupting the velocity vt\bm{v}_t at each step via the aforementioned recursion, rather than corrupting the state x\bm{x} directly.

Forward Process (Trajectory Generation)

  1. Initialize z0=x0\bm{z}_0 = \bm{x}_0, v0=β(ϵ0−x0)\bm{v}_0 = \beta (\bm{\epsilon}_0 - \bm{x}_0).
  2. Iterate for t=1t=1 to TT:
    • Draw ϵt∼N(0,I)\bm{\epsilon}_t \sim \mathcal{N}(0, I).
    • Compute zt=zt−1+vt−1\bm{z}_t = \bm{z}_{t-1} + \bm{v}_{t-1}.
    • Update vt=γ vt−1+1−γ βϵt\bm{v}_t = \sqrt{\gamma}\,\bm{v}_{t-1} + \sqrt{1-\gamma}\,\beta \bm{\epsilon}_t.

Reverse Process (Sampling from π0\bm{\pi}_0)

  1. Start from zT∼π1\bm{z}_T \sim \bm{\pi}_1.
  2. For each t=Tt=T down to $1$:
    • Treat each sub-segment as a rectified-flow matching problem.
    • Use the trained uθu_\theta to solve dζdm=uθ(ζ,m)\frac{d\zeta}{dm} = u_\theta(\zeta, m), starting from ζ(0)=zt\zeta(0) = \bm{z}_t and integrating to m=1m=1 to recover zt−1\bm{z}_{t-1}.

Editor's term: "stochastic velocity field sampling" refers to generating forward trajectories by drawing fresh noise for velocities at each anchor point.

4. Efficiency, Diversity, and Theoretical Properties

Each MFM sub-path remains straight within its segment; thus, sampling is Euler-style with constant velocity per sub-segment, preserving the ODE-based efficiency of rectified flow. As tt increases, γt\gamma^t decays, causing vt\bm{v}_t to become dominated by Gaussian noise. This mechanism yields:

  • Efficiency near the data manifold (Ï€0\bm{\pi}_0): Early sub-paths are nearly deterministic, minimizing mode collapse and preserving modeling speed.
  • Diversity near the noise manifold (Ï€1\bm{\pi}_1): Late sub-paths inject high stochasticity, causing trajectories to fan out and span a wider region of the state space.

Theoretical regime interpolation: RF is recovered as γ→1\gamma \rightarrow 1 or T=1T=1, and standard diffusion emerges as γ→0\gamma \rightarrow 0 or T→∞T \rightarrow \infty. MFM continuously interpolates this spectrum by tuning γ\gamma, controlling the trajectory diversity and fidelity (Ma et al., 10 Jun 2025).

5. Empirical Evaluation

Experiments were conducted on CIFAR-10 (32×32), CelebA-HQ (256×256), ImageNet-32, and ImageNet-64, using a U-Net architecture. Training employed 70k steps, AdamW with lr≈3×10−4\mathrm{lr} \approx 3 \times 10^{-4}, batch size 16 (64 for ImageNet), and cosine annealing. Sampling performance is reported in Fréchet Inception Distance (FID) and recall, with 30k samples per evaluation.

Dataset Model Steps FID Recall
CIFAR-10 RF 50 32.83 0.457
CIFAR-10 MFM (N=2N=2) 50 32.55 0.459
CIFAR-10 MFM (N=5N=5) 50 41.12 –
CelebA-HQ RF 50 65.38 0.384
CelebA-HQ MFM (N=2N=2) 50 54.07 0.457
CelebA-HQ MFM (N=5N=5) 50 >54.07 –

Across ImageNet-32 and ImageNet-64, MFM consistently demonstrated lower FID and higher recall at matched or lower numbers of function evaluations. Qualitative samples indicate sharper, more diverse facial features (e.g., eyes, teeth, accessories), attributed to increased trajectory diversity.

6. Ablations, Sensitivity, and Limitations

Ablation studies reveal:

  • Decay rate γ\gamma: Smaller γ\gamma results in faster decorrelation of vt\bm{v}_t and more diversity, but can lower fidelity if too small early on. Optimal γ\gamma depends on the number of anchors (NN). For example, γ=0.99\gamma=0.99 for N=2N=2 provided the best trade-off, while larger NN required γ≈1\gamma \approx 1.
  • Anchor points (NN): Increasing NN expands path exploration near Ï€1\bm{\pi}_1, but excessive NN with fixed γ\gamma can diminish deterministic guidance from initial anchors.

Limitations include the use of a fixed γt\gamma_t; a learnable or adaptive schedule for γt\gamma_t could further optimize the efficiency-diversity trade-off.

7. Extensions and Future Work

Potential improvements and research directions include:

  • Learning an adaptive or data-dependent decay schedule γt\gamma_t for more nuanced control over path diversity.
  • Incorporating knowledge distillation to further reduce the required number of sampling steps.
  • Extending MFM to conditional or latent variable models.

In conclusion, MFM generalizes rectified flow by injecting controlled, time-dependent Gaussian noise in the velocity field, producing piecewise-straight trajectories that are deterministically guided near the data manifold and stochastically fan out near the noise manifold. This approach achieves competitive efficiency with rectified flow and recovers much of the multi-scale diversity advantageous in diffusion modeling (Ma et al., 10 Jun 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Momentum Flow Matching (MFM).