MeanFlow Identity: Fast One-Step Generation

Updated 3 November 2025

MeanFlow Identity is a mathematical framework that links average interval velocity to instantaneous velocity for efficient one-step generative modeling.
It derives a principled training loss using interval integration, enabling fast inference without the need for iterative ODE integration.
MeanFlow offers significant improvements in multimodal synthesis, achieving real-time performance in tasks like video-to-audio generation.

The MeanFlow identity is a mathematical and algorithmic construct underpinning recent advancements in efficient, one-step generative modeling. It formalizes the relationship between average (interval-aggregated) and instantaneous velocity fields in flow-based generative trajectories, enabling direct, non-iterative sample generation with substantial improvements in inference speed and scalability, particularly for multimodal video-to-audio (VTA) synthesis and related domains.

1. Mathematical Formulation and Definition

The MeanFlow identity emerges from a generalization of flow matching in continuous-time generative models. Traditional flow matching learns the instantaneous velocity $\bm v(\bm z_t, t)$ along the trajectory connecting a sample from a prior $p_\text{prior}$ to a data distribution $p_\text{data}$ via the ODE:

$\frac{d\bm z_t}{dt} = \bm v(\bm z_t, t), \qquad \bm z_t = (1-t)\bm x + t\bm\epsilon$

with ground-truth instantaneous velocity $\bm v(\bm z_t, t) = \bm\epsilon - \bm x$ . The model is trained by regressing the network $\bm v_\theta$ to the true velocity at interpolated points $(\bm z_t, t)$ , usually requiring iterative ODE integration for sample generation.

MeanFlow reframes this by modeling the average velocity field over an interval $[r, t]$ as:

$\bm u(\bm z_t, r, t) = \frac{1}{t - r} \int_{r}^{t} \bm v(\bm z_\tau, \tau) d\tau$

Crucially, the MeanFlow identity ties this average velocity to the instantaneous velocity at $t$ via:

$\bm u(\bm z_t, r, t) = \bm v(\bm z_t, t) - (t - r) \frac{d}{dt} \bm u(\bm z_t, r, t)$

where the total derivative:

$\frac{d}{dt} \bm u(\bm z_t, r, t) = \bm v(\bm z_t, t) \partial_{\bm z} \bm u + \partial_t \bm u$

accounts for both explicit time dependence and trajectory state evolution.

2. Training and Inference Procedures

The MeanFlow identity determines the network's supervision target during training: a neural predictor $\bm u_\theta$ is optimized to satisfy

$\mathcal{L}_\text{MF}(\theta) = \mathbb{E}_{r, t, \bm x, \bm\epsilon} \|\bm u_\theta(\bm z_t, r, t) - \text{sg}(tgt)\|^2$

where

$tgt = \bm v_t - (t - r) (\bm v_t \partial_{\bm z}{\bm u_\theta} + \partial_t \bm u_\theta)$

and $\text{sg}(\cdot)$ indicates a stop-gradient to avoid higher-order differentiation.

At inference, sample generation proceeds in a single evaluation using:

$\bm z_r = \bm z_t - (t - r)\bm u_\theta(\bm z_t, r, t)$

Most typically, $(r,t) = (0,1)$ so that

$\bm z_0 = \bm\epsilon - \bm u_\theta(\bm\epsilon, 0, 1)$

yielding fast, direct mapping from prior to data space without iterative denoising.

3. Contrast with Instantaneous Velocity Methods

Traditional flow matching or diffusion models depend on accurately modeling instantaneous velocities and numerically integrating the corresponding ODE (sometimes tens or hundreds of steps):

Method	Target Field	Sampling Mechanism	Inference Speed
Flow Matching (FM)	$\bm v$ (instantaneous)	ODE integration (multi-step)	Slow
MeanFlow	$\bm u$ (average)	One-step flow map	Fast

The MeanFlow identity structurally guarantees that the learned average velocity accumulates the same total transport as via sequential integration of instantaneous velocities, yielding minimal discretization error and quality compromise.

4. Implications for Multimodal Generative Tasks

In multimodal synthesis—for instance, video-to-audio generation—the identity empowers direct sample generation that preserves semantic and temporal alignment:

Efficiency: Orders-of-magnitude reduction in inference time (real-time factor RTF improved from 0.015 to 0.007 in VTA synthesis).
Quality: Maintains perceptual and temporal fidelity; empirical results confirm no significant compromise in alignment or synchronization.
Flexibility: Framework can be extended from one-step to multi-step inference, trading quality for speed as required.
Simplified Architecture: No need for auxiliary distillation or pretraining stages commonly used in previous acceleration approaches.

5. Theoretical Significance and Broader Context

The MeanFlow identity encodes a principled bridge between local and global trajectory statistics in continuous-time models. Notably:

When $r = t$ , the objective reduces to classic flow matching loss, highlighting MeanFlow as a proper superset.
The approach is differentiable and compatible with modern autodiff frameworks, permitting efficient computation of the necessary Jacobian-vector products for training.
The identity offers a mathematically grounded avenue for efficient generative modeling, in contrast to network-centric consistency constraints or shortcut models that only operate at the level of network outputs.

6. Key Equations Table

Component	Equation	Description
Instantaneous velocity	$\bm v(\bm z_t, t)$	Used in FM (iterative)
Average velocity (MeanFlow)	$\bm u(\bm z_t, r, t) = \frac{1}{t - r} \int_{r}^{t} \bm v(\bm z_\tau, \tau) d\tau$	Interval mean velocity
MeanFlow identity	$\bm u(\bm z_t, r, t) = \bm v(\bm z_t, t) - (t - r)\frac{d}{dt} \bm u(\bm z_t, r, t)$	Core differential relationship
Sampling update	$\bm z_r = \bm z_t - (t - r)\bm u(\bm z_t, r, t)$	Enables one-step generation

7. Impact and Prospective Directions

Deployment of the MeanFlow identity within multimodal video-to-audio synthesis (and wider generative domains) establishes a new standard for efficiency, scalability, and simplicity in generative architectures. Practical advantages include real-time generative capability for interactive media, dubbing, and accessibility solutions. The framework's abstraction over local velocity fields and compatibility with sequential or multimodal conditioning render it highly adaptable to future research directions in accelerated, high-fidelity generative modeling.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to MeanFlow Identity.