MeanFlow-Accelerated Generative Model

Updated 15 September 2025

The paper introduces MeanFlow by directly learning average velocity fields, bypassing iterative simulation and traditional instantaneous flow matching.
Its architecture employs modular networks with positional embeddings and gradient modulation, enabling stable training and effective conditional sampling.
Empirical evaluations demonstrate competitive state-of-the-art sample quality with rapid convergence in image synthesis, multimodal mapping, and physical simulations.

A MeanFlow-Accelerated Model is a generative modeling paradigm that replaces the standard instantaneous velocity-based flow-matching or rectified flow parameterizations with direct learning of time-averaged velocity fields. This approach enables simulation-free, one-step or few-step sample generation across domains such as image synthesis, multimodal video-to-audio mapping, and even inhomogeneous gas dynamics. Recent theoretical and empirical results demonstrate that MeanFlow and its modular and higher-order extensions achieve competitive or state-of-the-art sample quality while providing scalable training, robust convergence, and significant acceleration—often without requiring pretraining, distillation, or iterative curriculum schedules.

1. Theoretical Underpinnings: Average Velocity and the MeanFlow Identity

Conventional flow-matching techniques train neural networks to predict the instantaneous velocity field $v(z_t, t)$ governing sample evolution along a prescribed ODE trajectory. The MeanFlow-Accelerated Model introduces the concept of average velocity $u(z_t, r, t)$ , defined as

$u(z_t, r, t) \equiv \frac{1}{t-r} \int_{r}^{t} v(z_\tau, \tau) d\tau,$

where $z_\tau$ traces the trajectory from prior to data distribution and $[r,t]$ is an arbitrary time interval. The core analytical result is the MeanFlow Identity:

$u(z_t, r, t) = v(z_t, t) - (t - r) \cdot \frac{d}{dt} u(z_t, r, t),$

with the total derivative $\frac{d}{dt}u(z_t,r,t) = v(z_t, t)\cdot \frac{\partial u}{\partial z} + \frac{\partial u}{\partial t}$ . This identity links the time-averaged field to the instantaneous generator, permitting regression-based training objectives that align the learned average velocity with empirical displacements, eliminating the need to simulate or directly integrate through intermediate steps (Geng et al., 19 May 2025, You et al., 24 Aug 2025). Higher-order generalizations, such as Second-Order MeanFlow, define average acceleration fields and prove analogous consistency relationships:

$(t - r) \bar{a}(z_t, r, t) = (s - r) \bar{a}(z_s, r, s) + (t - s)\bar{a}(z_t, s, t).$

Such constructs enable the direct modeling of non-linear, curvature effects in generated flows (Cao et al., 9 Aug 2025).

2. Model Architectures and Loss Design

MeanFlow networks parameterize $u_\theta(z_t, r, t)$ as a function of both the latent state and two time variables, typically encoded by sinusoidal or learned positional embeddings. This architecture generalizes instantaneous velocity networks to continuous time-space domains and supports conditional sampling via classifier-free guidance (CFG). The training process minimizes an L2 regression loss derived from the MeanFlow identity:

$L(\theta) = \mathbb{E} \left[ \left\| u_\theta(z_t, r, t) - \textrm{sg} \left( v(z_t, t) - (t-r)(v(z_t, t) \cdot \partial_z u_\theta + \partial_t u_\theta) \right) \right\|^2 \right],$

where $\textrm{sg}$ denotes stop-gradient.

Modular MeanFlow introduces a gradient modulation mechanism permitting robust interpolation between full- and stop-gradient propagation, stabilizing training against exploding higher-order derivatives:

$\textrm{SG}_\lambda[z] = \lambda \cdot z + (1-\lambda)\cdot \textrm{stopgrad}(z),$

with $\lambda$ linearly increased via a curriculum-style warmup to balance coarse and fine supervision (You et al., 24 Aug 2025). Complex architectures such as Synchformer encoders or cross-modal attention layers are employed in multimodal applications (Yang et al., 8 Sep 2025).

3. Accelerated Sampling and One-Step Generation

A central feature of the MeanFlow approach is native support for direct one-step mapping:

$z_0 = z_1 - u(z_1, 0, 1),$

eliminating need for multi-step ODE solvers or trajectory straightening procedures as in PeRFlow or CAF (Yan et al., 13 May 2024, Park et al., 1 Nov 2024). In modular and higher-order extensions, the update may incorporate time derivatives or acceleration components, enhancing trajectory fidelity:

$x_r \approx x_t - (t - r) \cdot u(x_t, r, t), \quad \textrm{or} \quad x_r \approx x_t - (t-r) u_1 - \frac{1}{2}(t-r)^2 u_2,$

where $u_1, u_2$ are learned velocity and acceleration fields, respectively (Cao et al., 9 Aug 2025, You et al., 24 Aug 2025). In the context of classifier-free guidance for conditional tasks, a scalar rescaling mechanism aligns conditional and unconditional updates, mitigating CFG-induced distortion artifacts and preserving semantic fidelity in one-shot generation:

$u_\theta^{cfg-scaled} = \omega \cdot u_\theta(\cdot | c) + (1-\omega) s \cdot u_\theta(\cdot | \varnothing), \qquad s = \frac{u_\theta(\cdot | c)^\top u_\theta(\cdot | \varnothing)}{\|u_\theta(\cdot | \varnothing)\|^2}$

(Yang et al., 8 Sep 2025).

4. Empirical Evaluation and Performance Metrics

MeanFlow-Accelerated Models demonstrate empirical superiority on large-scale generative benchmarks:

On ImageNet 256×256, MeanFlow achieves an FID of 3.43 with 1-NFE sampling, substantially outperforming previous one-step methods (Shortcut, IMM) with FID ≥10 (Geng et al., 19 May 2025).
MMF yields lower FID and faster convergence than consistency models and other baselines, with robustness under out-of-distribution and low-data regimes (You et al., 24 Aug 2025).
Multimodal MF-MJT attains real-time factors (RTF) as low as 0.007 on VTA synthesis, corresponding to up to 500× speedup compared to iterative schemes, while preserving high perceptual quality (FAD, FD, KL metrics), semantic alignment, and synchronization (Yang et al., 8 Sep 2025).
Higher-order MeanFlow yields theoretical error improvements ( $O((t-r)^3)$ vs $O((t-r)^2)$ ) with sampling procedures provably parallelizable and scalable via TC⁰-uniform threshold circuits (Cao et al., 9 Aug 2025).

5. Extensions and Applications

MeanFlow methodology has been successfully applied to:

Multimodal video-to-audio and text-to-audio synthesis, where direct one-step generation achieves both efficiency and quality (Yang et al., 8 Sep 2025).
Physical simulation models for pressureless gas dynamics, using average velocity and acceleration fields to reconstruct cluster evolution, momentum conservation, and congestion effects in particle ensembles (Moudoumou et al., 8 Sep 2025).
Instance-aware diffusion acceleration, as in RayFlow, where adaptive target means guide sample-specific trajectories, improving controllability and sample diversity (Shao et al., 10 Mar 2025).

The option to incorporate classifier-free guidance, curriculum training, and fast approximate attention mechanisms further highlights the versatility of this paradigm in domains requiring rapid, robust, and high-fidelity mapping from complex latent representations.

6. Relationship to Prior Acceleration Methods

MeanFlow-Accelerated Models unify and generalize consistency-based (discrepancy minimization between time steps) and flow-matching (instantaneous velocity regression) approaches. Unlike PeRFlow's piecewise rectification or CAF's constant acceleration assumption (which retain trajectory-based simulation and pretraining dependencies) (Yan et al., 13 May 2024, Park et al., 1 Nov 2024), MeanFlow supports sample-to-data mapping without trajectory splitting, distillation, or manually tuned step schedules. Modular variants avoid expensive higher-order Jacobian computations, and higher-order variants guarantee scalability via provably efficient circuit complexity criteria (Cao et al., 9 Aug 2025, You et al., 24 Aug 2025).

7. Future Directions and Open Challenges

Research into MeanFlow-Accelerated Models has spurred investigations into:

Deeper theoretical connections between time-averaged and instantaneous dynamics, including regularity and expressivity implications of high-order consistency conditions (Cao et al., 9 Aug 2025).
Extensions to physics-based simulation and free-boundary problems, leveraging conditional expectation and velocity-space mass redistribution as illustrated in generalized sticky particle models (Moudoumou et al., 8 Sep 2025).
Improved scheduling, guidance, and robust scalar rescaling methods for extremely conditional or multimodal inference regimes (Yang et al., 8 Sep 2025).

A plausible implication is that future generative modeling methods will further integrate average velocity or acceleration-informed fields, and leverage modular framework extensions, to achieve improved scalability, fidelity, stability, and broader domain applicability—including those with irregular sample or dynamical structure.

This synthesis is based strictly on the content and results reported in papers (Geng et al., 19 May 2025, You et al., 24 Aug 2025, Cao et al., 9 Aug 2025, Yang et al., 8 Sep 2025, Shao et al., 10 Mar 2025, Park et al., 1 Nov 2024, Yan et al., 13 May 2024), and (Moudoumou et al., 8 Sep 2025).