Momentum Flow Matching (MFM)

Updated 8 February 2026

MFM is a generative modeling paradigm that injects time-dependent Gaussian noise into its velocity field, balancing deterministic behavior near data with stochastic exploration near noise.
The method discretizes straight-line trajectories into segments using a recursive velocity field formulation, enabling efficient sampling and enhanced multi-scale noise modeling.
Empirical evaluations on datasets like CIFAR-10 and CelebA-HQ demonstrate that MFM can reduce FID and improve recall by tuning decay rates and anchor points to control the trade-off between efficiency and diversity.

Momentum Flow Matching (MFM) is a generative modeling paradigm that introduces a stochastic, velocity-field-based approach to sampling between data and noise distributions, generalizing rectified flow (RF) models by discretizing straight-line trajectories into a sequence of variable "momentum fields." MFM interpolates between the efficiency of fast, straight-path RF sampling and the diversity and multi-scale noise modeling of diffusion models by injecting time-dependent Gaussian noise directly into the velocity field at each step, rather than into the state variable. This architecture aims to balance fast, efficient sampling near the data manifold with increasingly stochastic, exploratory dynamics as the trajectory approaches the noise manifold, thereby improving generation diversity without incurring the computational costs of traditional diffusion.

1. Motivation and Limitations of Rectified Flow

Rectified flow (RF) models define a deterministic ordinary differential equation (ODE) with constant velocity:

$\dfrac{d\bm{x}_t}{dt} = \bm{v} = \bm{x}_1 - \bm{x}_0$

where $\bm{x}_0 \sim \bm{\pi}_0$ is a data sample and $\bm{x}_1 \sim \bm{\pi}_1$ is drawn from a noise distribution. This produces highly efficient, straight-line sampling trajectories requiring as little as one integration step. However, this fixed-velocity strategy restricts expressiveness:

Low diversity: All trajectories follow nearly the same path in data space, limiting sample variability.
Poor multi-scale noise modeling: There is no progressive corruption or denoising; the model is forced to ignore the granular evolution of noise scales over time.

While diffusion models address both issues via hundreds or thousands of stochastic steps, their computational overhead remains substantial. MFM seeks to blend the efficiency of RF with the richness of diffusion, traversing deterministic paths near $\pi_0$ and smoothly transitioning to highly stochastic exploration near $\pi_1$ (Ma et al., 10 Jun 2025).

2. Formulation and Learning Objective

MFM introduces a time-indexed velocity field $\bm{v}_t$ at each sub-path, determined recursively as follows:

Initialization:

$\bm{v}_0 = \beta (\bm{\epsilon}_0 - \bm{x}_0), \quad \bm{\epsilon}_0 \sim \mathcal{N}(0, I)$

Recursion for $0 < t < T$:

$\bm{v}_t = \sqrt{\gamma} \, \bm{v}_{t-1} + \sqrt{1-\gamma} \, \beta \, \bm{\epsilon}_t, \quad \bm{\epsilon}_t \sim \mathcal{N}(0, I)$

Terminal velocity:

$\bm{v}_T = \beta \, \bm{\epsilon}_T$

Here, $T$ is the number of anchor points, $\gamma$ ( $0 < \gamma < 1$ ) is a decay factor, and

$\beta = \frac{\sqrt{\gamma} - 1}{\sqrt{\gamma^T}-1}$

As a state update, each anchor point in the path is given by:

$\bm{z}_t = \bm{z}_{t-1} + \bm{v}_{t-1}, \quad \bm{z}_0 = \bm{x}_0$

The velocity $\bm{v}_t$ can also be expressed in closed form as:

$\bm{v}_t = \sqrt{\gamma^t} \,\bm{v}_0 + \sqrt{1-\gamma^t} \, \beta \bm{\epsilon}_t$

A neural network $u_\theta(\bm{z}, t)$ is trained to predict $\bm{v}_t$ at continuous time $t$ (scaled so $t \in [0,1]$ ). The learning objective ("momentum flow matching loss") is:

$L_{\mathrm{mfm}}(\theta) = \mathbb{E}_{t \sim U[0,1]} \left\| u_\theta(\bm{z}_t, t) - \bm{v}_t \right\|^2$

3. Algorithmic Structure and Sampling Procedures

MFM divides the straight-line path from $\bm{x}_0$ to $\bm{x}_1$ into $T$ segments using anchor points $\bm{z}_0, ..., \bm{z}_T$ . Each segment is governed by a time-varying velocity, evolving from deterministic ( $t=0$ ) to fully stochastic ( $t=T$ ). MFM departs from classical noise injection by corrupting the velocity $\bm{v}_t$ at each step via the aforementioned recursion, rather than corrupting the state $\bm{x}$ directly.

Forward Process (Trajectory Generation)

Initialize $\bm{z}_0 = \bm{x}_0$ , $\bm{v}_0 = \beta (\bm{\epsilon}_0 - \bm{x}_0)$ .
Iterate for $t=1$ to $T$ :
- Draw $\bm{\epsilon}_t \sim \mathcal{N}(0, I)$ .
- Compute $\bm{z}_t = \bm{z}_{t-1} + \bm{v}_{t-1}$ .
- Update $\bm{v}_t = \sqrt{\gamma}\,\bm{v}_{t-1} + \sqrt{1-\gamma}\,\beta \bm{\epsilon}_t$ .

Reverse Process (Sampling from $\bm{\pi}_0$ )

Start from $\bm{z}_T \sim \bm{\pi}_1$ .
For each $t=T$ down to $1$:
- Treat each sub-segment as a rectified-flow matching problem.
- Use the trained $u_\theta$ to solve $\frac{d\zeta}{dm} = u_\theta(\zeta, m)$ , starting from $\zeta(0) = \bm{z}_t$ and integrating to $m=1$ to recover $\bm{z}_{t-1}$ .

Editor's term: "stochastic velocity field sampling" refers to generating forward trajectories by drawing fresh noise for velocities at each anchor point.

4. Efficiency, Diversity, and Theoretical Properties

Each MFM sub-path remains straight within its segment; thus, sampling is Euler-style with constant velocity per sub-segment, preserving the ODE-based efficiency of rectified flow. As $t$ increases, $\gamma^t$ decays, causing $\bm{v}_t$ to become dominated by Gaussian noise. This mechanism yields:

Efficiency near the data manifold ( $\bm{\pi}_0$ ): Early sub-paths are nearly deterministic, minimizing mode collapse and preserving modeling speed.
Diversity near the noise manifold ( $\bm{\pi}_1$ ): Late sub-paths inject high stochasticity, causing trajectories to fan out and span a wider region of the state space.

Theoretical regime interpolation: RF is recovered as $\gamma \rightarrow 1$ or $T=1$ , and standard diffusion emerges as $\gamma \rightarrow 0$ or $T \rightarrow \infty$ . MFM continuously interpolates this spectrum by tuning $\gamma$ , controlling the trajectory diversity and fidelity (Ma et al., 10 Jun 2025).

5. Empirical Evaluation

Experiments were conducted on CIFAR-10 (32×32), CelebA-HQ (256×256), ImageNet-32, and ImageNet-64, using a U-Net architecture. Training employed 70k steps, AdamW with $\mathrm{lr} \approx 3 \times 10^{-4}$ , batch size 16 (64 for ImageNet), and cosine annealing. Sampling performance is reported in Fréchet Inception Distance (FID) and recall, with 30k samples per evaluation.

Dataset	Model	Steps	FID	Recall
CIFAR-10	RF	50	32.83	0.457
CIFAR-10	MFM ( $N=2$ )	50	32.55	0.459
CIFAR-10	MFM ( $N=5$ )	50	41.12	–
CelebA-HQ	RF	50	65.38	0.384
CelebA-HQ	MFM ( $N=2$ )	50	54.07	0.457
CelebA-HQ	MFM ( $N=5$ )	50	>54.07	–

Across ImageNet-32 and ImageNet-64, MFM consistently demonstrated lower FID and higher recall at matched or lower numbers of function evaluations. Qualitative samples indicate sharper, more diverse facial features (e.g., eyes, teeth, accessories), attributed to increased trajectory diversity.

6. Ablations, Sensitivity, and Limitations

Ablation studies reveal:

Decay rate $\gamma$ : Smaller $\gamma$ results in faster decorrelation of $\bm{v}_t$ and more diversity, but can lower fidelity if too small early on. Optimal $\gamma$ depends on the number of anchors ( $N$ ). For example, $\gamma=0.99$ for $N=2$ provided the best trade-off, while larger $N$ required $\gamma \approx 1$ .
Anchor points ( $N$ ): Increasing $N$ expands path exploration near $\bm{\pi}_1$ , but excessive $N$ with fixed $\gamma$ can diminish deterministic guidance from initial anchors.

Limitations include the use of a fixed $\gamma_t$ ; a learnable or adaptive schedule for $\gamma_t$ could further optimize the efficiency-diversity trade-off.

7. Extensions and Future Work

Potential improvements and research directions include:

Learning an adaptive or data-dependent decay schedule $\gamma_t$ for more nuanced control over path diversity.
Incorporating knowledge distillation to further reduce the required number of sampling steps.
Extending MFM to conditional or latent variable models.

In conclusion, MFM generalizes rectified flow by injecting controlled, time-dependent Gaussian noise in the velocity field, producing piecewise-straight trajectories that are deterministically guided near the data manifold and stochastically fan out near the noise manifold. This approach achieves competitive efficiency with rectified flow and recovers much of the multi-scale diversity advantageous in diffusion modeling (Ma et al., 10 Jun 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Flow Diverse and Efficient: Learning Momentum Flow Matching via Stochastic Velocity Field Sampling (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Momentum Flow Matching (MFM).

Momentum Flow Matching (MFM)

1. Motivation and Limitations of Rectified Flow

2. Formulation and Learning Objective

3. Algorithmic Structure and Sampling Procedures

Forward Process (Trajectory Generation)

Reverse Process (Sampling from $\bm{\pi}_0$ )

4. Efficiency, Diversity, and Theoretical Properties

5. Empirical Evaluation

6. Ablations, Sensitivity, and Limitations

7. Extensions and Future Work

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Momentum Flow Matching (MFM)

1. Motivation and Limitations of Rectified Flow

2. Formulation and Learning Objective

3. Algorithmic Structure and Sampling Procedures

Forward Process (Trajectory Generation)

Reverse Process (Sampling from π0\bm{\pi}_0π0​)

4. Efficiency, Diversity, and Theoretical Properties

5. Empirical Evaluation

6. Ablations, Sensitivity, and Limitations

7. Extensions and Future Work

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Reverse Process (Sampling from $\bm{\pi}_0$ )