MeanFlow-Based Generative Model

Updated 12 August 2025

MeanFlow-based models are generative frameworks that replace instantaneous velocity with an interval-averaged (mean) velocity, enabling direct mapping from noise to data.
The approach utilizes the MeanFlow Identity to relate average and instantaneous velocities, supporting one- or few-step generation with efficient training.
Empirical evaluations show state-of-the-art results in image synthesis, audio generation, and policy learning, highlighting practical benefits in speed and consistency.

A MeanFlow-based model refers to a class of generative and inference frameworks that replace the classic instantaneous velocity field used in flow matching with an interval-averaged (mean) velocity, thereby enabling direct, efficient mappings from noise to data. By formulating an explicit relationship—termed the MeanFlow Identity—between average and instantaneous velocities, these models support one- or few-step generation, rapid inference, and principled consistency, with robust empirical performance across vision, audio, policy learning, and other modalities (Geng et al., 19 May 2025, Sheng et al., 14 Jul 2025, Li et al., 8 Aug 2025, Guo et al., 22 Jul 2025, Cao et al., 9 Aug 2025).

1. Mathematical Foundations of MeanFlow

MeanFlow-based models are grounded in integral formulations of dynamical systems that parametrize flows not by their instantaneous velocities $v(z, t)$ , but by their average velocity $u(z_t, r, t)$ over an interval $[r, t]$ :

$u(z_t, r, t) = \frac{1}{t - r} \int_r^t v(z_\tau, \tau) d\tau$

A central theoretical result is the MeanFlow Identity, which relates the average velocity to the instantaneous velocity:

$u(z_t, r, t) = v(z_t, t) - (t - r)\frac{d}{dt}u(z_t, r, t)$

Here, $\frac{d}{dt}u(z_t, r, t)$ is decomposed into terms involving both $\partial_{z} u$ and $\partial_t u$ via the chain rule.

The MeanFlow model uses this identity as a training target: a neural network is trained to output $u_\theta(z, r, t)$ such that, given a stochastic trajectory $z_t$ (interpolated between data and noise), the network's output solves the identity above throughout training.

This approach allows the direct mapping from prior noise (e.g., $z_1\sim \mathcal{N}(0, I)$ ) to a data sample in a single inference step:

$z_0 = z_1 - u_\theta(z_1, 0, 1)$

2. Implementation Strategies and Loss Formulation

At training time, the following steps are performed:

Draw pairs of times $(r, t)$ (with $t > r$ ), a data sample $x$ , and latent noise $e$ .
Compute the intermediate flow state $z = (1-t)\,x + t\,e$ .
Calculate the conditional velocity $v_t$ (for the "straight" flow schedule, usually $v_t = e - x$ ).
To evaluate the right-hand side of the MeanFlow Identity, compute the Jacobian–vector product (JVP) to obtain derivatives of $u_\theta$ with respect to $z$ and $t$ .
The loss is defined as:

$\operatorname{Loss}(\theta) = \mathbb{E}\left[ \| u_\theta(z, r, t) - \mathrm{sg}(v_t - (t - r)(v_t \cdot \partial_{z}u_\theta + \partial_t u_\theta)) \|_2^2 \right]$

where $\mathrm{sg}(\cdot)$ denotes a stop-gradient operation to prevent higher-order derivatives during backpropagation.

No pretraining, distillation, or curriculum learning is required. For practical efficiency, time pairs $(r, t)$ are commonly sampled using a logit-normal or uniform distribution, and JVPs are computed using deep learning frameworks’ automatic differentiation (e.g., torch.func.jvp, jax.jvp).

At inference, one directly applies $z_0 = z_1 - u_\theta(z_1, 0, 1)$ , realizing one-step generation.

3. Empirical Performance and Applications

MeanFlow-based models have demonstrated state-of-the-art results in multiple domains:

Image Synthesis: Achieves Fréchet Inception Distance (FID) of 3.43 on ImageNet 256×256 with a single function evaluation (1-NFE), outperforming previous one-step diffusion and flow-based models (Geng et al., 19 May 2025).
Audio Synthesis: MeanAudio attains a real-time factor (RTF) of 0.013—enabling a $100\times$ inference speedup over previous diffusion-based systems, while preserving synthesis quality (Li et al., 8 Aug 2025).
Robotic Policy Learning: MP1, which employs the MeanFlow paradigm, obtains superior task success (e.g., 10.2% better than DP3 on the Adroit and Meta-World benchmarks) and achieves 19 $\times$ faster inference relative to iterative diffusion policy models (Sheng et al., 14 Jul 2025).
Reinforcement Learning: MeanFlow policy parametrizations in Flow Policy Mirror Descent yield comparable MuJoCo benchmark performance to diffusion policies while reducing the number of inference function evaluations by several orders of magnitude (Chen et al., 31 Jul 2025).
Recommender Systems: FMRec leverages a similar flow matching structure for deterministic and efficient sequential recommendations (Liu et al., 22 May 2025).
Speech Synthesis: SplitMeanFlow, an algebraic generalization, achieves $20\times$ speedup in real-world text-to-speech applications (Guo et al., 22 Jul 2025).

MeanFlow enables one-step or few-step sampling with little or no degradation in generative quality compared to iterative models.

4. Extensions and Algorithmic Enhancements

Several advancements generalize or refine the MeanFlow approach:

Interval Splitting Consistency (SplitMeanFlow): Replaces derivative-based objectives with a purely algebraic constraint reflecting integral additivity. The Interval Splitting Consistency is formulated as:

$(t-r)u(z_t, r, t) = (s-r)u(z_s, r, s) + (t-s)u(z_t, s, t)$

for any $r < s < t$ , and is leveraged for more stable and efficient training without requiring Jacobian computations (Guo et al., 22 Jul 2025).

High-Order MeanFlow: Second-Order MeanFlow further incorporates average acceleration, uses a generalized additive identity for average acceleration, and demonstrates that the resulting sampling algorithm resides in $\mathsf{TC}^0$ (constant-depth threshold circuits), optimizing both expressivity and hardware efficiency (Cao et al., 9 Aug 2025).
Classifier-Free Guidance (CFG): MeanAudio and MP1 tightly integrate CFG directly into their training objectives rather than as a post-hoc sampling modification, allowing for controllable generations without increasing inference cost (Li et al., 8 Aug 2025, Sheng et al., 14 Jul 2025).
Curriculum and Mix-up Strategies: MeanAudio introduces an instantaneous-to-mean curriculum, blending standard (instantaneous) and mean flow matching during training for increased stability and convergence rate (Li et al., 8 Aug 2025).
Dispersive Loss: MP1 adds a loss that repels encoded representations for different states, aiding generalization in low-data regimes (Sheng et al., 14 Jul 2025).
Deterministic Reverse Sampling: FMRec uses a straight-flow ODE with an Euler solver, which is exact for linear trajectories, further reducing sampling noise and inference cost (Liu et al., 22 May 2025).

5. Theoretical Implications and Computational Guarantees

MeanFlow-based models and their algebraic generalizations provide several guarantees and theoretical properties:

Consistency: Satisfaction of the MeanFlow identity (or its algebraic analogs) ensures that the generated trajectories are self-consistent, either in the differential (MeanFlow) or algebraic (SplitMeanFlow) sense.
Sampling Efficiency: By summarizing the entire flow as an average (rather than integrating small steps), one-step or few-step sampling is possible without introducing large discretization errors. In policy learning, the discretization error in one-step sampling is controlled by the variance of the target distribution and vanishes as the policy approaches determinism (Chen et al., 31 Jul 2025).
Expressivity: The circuit complexity of second-order MeanFlow sampling (via transformer networks) remains in constant-depth, polynomial-size threshold circuits, ensuring practical scalability even with richer dynamical representations (Cao et al., 9 Aug 2025).
Hardware Compatibility: The move towards algebraic objectives (as in SplitMeanFlow) and fast approximate attention mechanisms enables improved scalability on both conventional and specialized accelerator hardware (Guo et al., 22 Jul 2025, Cao et al., 9 Aug 2025).

6. Broader Impact and Future Directions

The MeanFlow-based paradigm has contributed to closing the gap between one-step and classic multi-step generative models, particularly in tasks requiring real-time inference and efficient computation (e.g., robotics, text-to-audio, policy learning).

Future directions include:

Further exploration of high-order (e.g., second-order) MeanFlow objectives, leveraging curvature and higher-order dynamics for increased expressivity without trading off sampling efficiency (Cao et al., 9 Aug 2025).
Generalized integral consistency principles (Interval Splitting Consistency) that bypass the need for Jacobian computations, making deployment more robust and accessible (Guo et al., 22 Jul 2025).
Cross-domain applications in simulation-based modeling, data assimilation in the physical sciences, and reinforcement learning exploitation–exploration dynamics.

By rigorously connecting integral consistency, hardware efficiency, and generative expressivity, MeanFlow-based models provide a theoretically principled and practically powerful foundation for modern high-speed generative modeling and inference.