MeanFlow: One-Step Generative Modeling

Updated 1 December 2025

MeanFlow is a generative modeling framework that maps noise to data in one step by regressing interval-averaged velocity.
It leverages a unique ODE formulation and JVP-based training to bridge the gap between diffusion, flow-matching, and consistency models.
Empirical benchmarks show significant speedups (10–100×) and competitive quality across image, audio, and video synthesis tasks.

MeanFlow is a principled framework for one-step generative modeling based on learning the interval-averaged (mean) velocity of a transformation, in contrast to conventional flow matching which relies on iterative integration of instantaneous velocity fields. By directly regressing the time-averaged (mean) velocity over an interval, MeanFlow enables direct mapping from noise to data in a single function evaluation, dramatically accelerating inference compared to diffusion and multistep flow-based models. The MeanFlow paradigm has been validated across image synthesis, audio generation, speech enhancement, video-to-audio, policy learning, and representation compression, achieving competitive or state-of-the-art quality at a fraction of the sampling cost.

1. Mathematical Foundations of MeanFlow

MeanFlow models the transport from a simple distribution (e.g., Gaussian noise) to a data distribution by the ODE formulation

$\frac{dz}{dt} = v(z, t)$

where $v(z, t)$ is the instantaneous velocity field. Standard flow matching learns $v$ locally, supporting only small steps and necessitating iterative solvers.

MeanFlow introduces the interval-averaged velocity: $u(z_t, r, t) = \frac{1}{t-r} \int_r^t v(z_s, s) ds$ where $z_s$ evolves according to $v$ from $z_r = x$ to $z_t$ . The core identity relates the mean and instantaneous velocities: $u(z_t, r, t) = v(z_t, t) - (t-r)\left[ v(z_t, t)\cdot\nabla_{z}u(z_t, r, t) + \partial_t u(z_t, r, t) \right]$ This identity (enforced via Jacobian-vector products, JVPs) enables a regression target for training. In the canonical one-step case ( $r=0$ , $t=1$ ): $x = \epsilon - u_\theta(\epsilon, 0, 1)$ with $\epsilon \sim \mathcal{N}(0, I)$ . Generation thus requires a single network evaluation (Geng et al., 19 May 2025, Agarwal et al., 26 Nov 2025, You et al., 24 Aug 2025).

The MeanFlow loss is

$\mathcal{L}_{\rm MF} = \mathbb{E}_{r,t,\,x_0,\,x_t}\left\|u_\theta(x_t, r, t) - u_{\rm target}(x_t, r, t)\right\|^2$

where the target $u_{\rm target}$ is as above and stop-gradient is applied to high-order terms for training stability.

2. Relation to Flow Matching and Diffusion Models

MeanFlow generalizes and interpolates between standard Denoising Diffusion Probabilistic Models (DDPM), Conditional Flow Matching (CFM), and consistency models:

DDPM: Minimizes $\|\epsilon - \epsilon_\theta\|^2$ over many noise levels, requiring hundreds of steps per sample.
CFM: Minimizes $\|v_\theta(x_t, t) - (x_1 - x_0)\|^2$ for instantaneous velocity, typically requiring 50–100 ODE steps.
MeanFlow: Regresses the interval-averaged velocity and achieves direct O(1) step inference, avoiding integration errors incurred by Euler or higher-order solvers.

Empirically, MeanFlow closes much of the quality gap between one-step and multistep approaches, e.g., FID 3.43 on ImageNet 256, which sharply outperforms previous single-step models (Geng et al., 19 May 2025, Agarwal et al., 26 Nov 2025).

Recent unifying frameworks such as Modular MeanFlow (You et al., 24 Aug 2025) and α-Flow (Zhang et al., 23 Oct 2025) show that MeanFlow, shortcut consistency models, and flow matching are connected by smooth interpolations in the corresponding training objectives.

3. Training Algorithms and Architectural Considerations

Efficient and stable MeanFlow training requires attention to the coupling of instantaneous and average velocity learning:

JVP implementation: Training uses JVPs to compute the chain-rule correction in the MeanFlow identity, adding minimal training overhead (~20%).
Combined loss: Many implementations blend direct instantaneous velocity supervision ( $r = t$ ) with average velocity supervision ( $r < t$ ), sometimes using curriculums or bias schedules for stability and bias–variance tradeoff (You et al., 24 Aug 2025, Kim et al., 24 Nov 2025).
Architecture: Backbone can be UNet or Transformer (DiT), with time conditioning via sinusoidal/time-MLP embeddings. For class-conditional generation, classifier-free guidance (CFG) can be integrated seamlessly into network outputs without additional passes (Agarwal et al., 26 Nov 2025, Li et al., 8 Aug 2025).
Enhanced training: Task-specific curriculum strategies—such as staged formation of instantaneous, then average, velocities; time-dependent weighting; or bias-variance interpolation—yield faster and higher-quality convergence (Kim et al., 24 Nov 2025, You et al., 24 Aug 2025, Zhang et al., 23 Oct 2025).

In high-dimensional domains (e.g., high-res images), latent-space approaches (MeanFlow-RAE) and teacher-distillation pipelines further reduce training cost and stabilize optimization (Hu et al., 17 Nov 2025).

4. Practical Applications and Empirical Benchmarks

MeanFlow has been deployed in a broad spectrum of domains:

Application	Model	Inference Steps	Main Metric / Speed	Reference
Image generation	MeanFlow XL/2	1	FID 3.43 (INet256)	(Geng et al., 19 May 2025)
“ “	MeanFlow-RAE	1	FID 2.03 (INet256),	(Hu et al., 17 Nov 2025)
			–38% GFLOPS
Speech enhancement	MeanFlowSE	1	SI-SDR 19.98 dB, RTF 0.11	(Li et al., 18 Sep 2025)
Text-to-audio	MeanAudio	1	FAD 1.77, RTF 0.013	(Li et al., 8 Aug 2025)
Video-to-audio	MeanFlow-MJT	1	FAD 1.46, RTF 0.007	(Yang et al., 8 Sep 2025)
Robotic manipulation	MP1, DM1	1	Success ↑ 10–19 pp,	(Sheng et al., 14 Jul 2025 Zou et al., 9 Oct 2025)
			6.8 ms (19× faster)
3D dance generation	FlowerDance	5–20	FID_k 29.7, 2008 FPS	(Yang et al., 26 Nov 2025)

Typical speedups are 10–100× over diffusion and multi-step flow matching, with single-step models delivering high-fidelity outputs.

5. Extensions, Variants, and Theoretical Understanding

Curriculum and modular losses: Modular MeanFlow (You et al., 24 Aug 2025) introduces a tunable gradient-blocking operator (stop-gradient with interpolation λ) and curriculum on λ, smoothing the transition from first-order (stable) to second-order (expressive) supervision.
Trajectory consistency: α-Flow (Zhang et al., 23 Oct 2025) decomposes MeanFlow losses into trajectory flow-matching and trajectory consistency terms, revealing negative gradient correlations and motivating α-annealed training schedules.
High-order MeanFlow: Extensions model average acceleration (second derivatives) and establish that second-order MeanFlow achieves even lower truncation error (O(Δt³)), and can be efficiently approximated with modern hardware (Cao et al., 9 Aug 2025).
Transport-based MeanFlow (OT-MF): Incorporates optimal-transport mini-batch couplings to better align paths between noise and target distributions, improving one-step alignment in multimodal or geometric domains (Akbari et al., 26 Sep 2025).
Architectural compression: MeanFlow modules can replace stacks of ResNet blocks, yielding smaller, parameter-efficient yet accurate discriminative architectures (MFI-ResNet) (Sun et al., 16 Nov 2025).
Q-learning policies: Residual reformulations enable direct compatibility with Bellman backups in offline RL, capturing multimodality and stability not afforded by naive one-step MeanFlow (Wang et al., 17 Nov 2025).

6. Limitations, Challenges, and Future Directions

MeanFlow’s main trade-off is a small increase in one-step model bias versus iterative approaches, though in practice the gap is now minimal for strong backbones and proper training (Geng et al., 19 May 2025, Agarwal et al., 26 Nov 2025, Lee et al., 28 Oct 2025). Key open problems and directions include:

Further reducing reliance on Jacobian computations (or replacing JVPs with cheaper surrogates).
Adaptive schedules for interval selection and weighting during training (Kim et al., 24 Nov 2025).
Generalizing to discrete or hybrid data spaces, higher-order flows, and domains requiring very high-fidelity geometric or semantic preservation.
Theoretical analysis of approximation errors induced by mean-velocity field learning under finite data, and expressivity limitations tied to backbone architectures (Cao et al., 9 Aug 2025).
Integration with advanced guidance, mixture-based flows, or controller regularization for policy learning and control.

7. Contextualization: MeanFlow in Broader Scientific Practice

MeanFlow, conceived as a self-contained alternative to conventional iterative solvers, represents a significant evolution in generative modeling: it provides an explicit, mathematically grounded average-velocity target, enables principled one-step mapping, and supports robust adaptation to challenging tasks spanning images, audio, control, and spatiotemporal domains. By revealing the deep connection between mean and instantaneous velocity fields, and offering extensible curricula and loss interpolations, MeanFlow now forms the core of a growing ecosystem of highly efficient generative models (Geng et al., 19 May 2025, Agarwal et al., 26 Nov 2025, You et al., 24 Aug 2025, Zhang et al., 23 Oct 2025, Zou et al., 9 Oct 2025, Hu et al., 17 Nov 2025, Kim et al., 24 Nov 2025).