Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 190 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 46 tok/s Pro
GPT-4o 130 tok/s Pro
Kimi K2 202 tok/s Pro
GPT OSS 120B 439 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

MeanFlow Identity: Fast One-Step Generation

Updated 3 November 2025
  • MeanFlow Identity is a mathematical framework that links average interval velocity to instantaneous velocity for efficient one-step generative modeling.
  • It derives a principled training loss using interval integration, enabling fast inference without the need for iterative ODE integration.
  • MeanFlow offers significant improvements in multimodal synthesis, achieving real-time performance in tasks like video-to-audio generation.

The MeanFlow identity is a mathematical and algorithmic construct underpinning recent advancements in efficient, one-step generative modeling. It formalizes the relationship between average (interval-aggregated) and instantaneous velocity fields in flow-based generative trajectories, enabling direct, non-iterative sample generation with substantial improvements in inference speed and scalability, particularly for multimodal video-to-audio (VTA) synthesis and related domains.

1. Mathematical Formulation and Definition

The MeanFlow identity emerges from a generalization of flow matching in continuous-time generative models. Traditional flow matching learns the instantaneous velocity v(zt,t)\bm v(\bm z_t, t) along the trajectory connecting a sample from a prior ppriorp_\text{prior} to a data distribution pdatap_\text{data} via the ODE:

dztdt=v(zt,t),zt=(1t)x+tϵ\frac{d\bm z_t}{dt} = \bm v(\bm z_t, t), \qquad \bm z_t = (1-t)\bm x + t\bm\epsilon

with ground-truth instantaneous velocity v(zt,t)=ϵx\bm v(\bm z_t, t) = \bm\epsilon - \bm x. The model is trained by regressing the network vθ\bm v_\theta to the true velocity at interpolated points (zt,t)(\bm z_t, t), usually requiring iterative ODE integration for sample generation.

MeanFlow reframes this by modeling the average velocity field over an interval [r,t][r, t] as:

u(zt,r,t)=1trrtv(zτ,τ)dτ\bm u(\bm z_t, r, t) = \frac{1}{t - r} \int_{r}^{t} \bm v(\bm z_\tau, \tau) d\tau

Crucially, the MeanFlow identity ties this average velocity to the instantaneous velocity at tt via:

u(zt,r,t)=v(zt,t)(tr)ddtu(zt,r,t)\bm u(\bm z_t, r, t) = \bm v(\bm z_t, t) - (t - r) \frac{d}{dt} \bm u(\bm z_t, r, t)

where the total derivative:

ddtu(zt,r,t)=v(zt,t)zu+tu\frac{d}{dt} \bm u(\bm z_t, r, t) = \bm v(\bm z_t, t) \partial_{\bm z} \bm u + \partial_t \bm u

accounts for both explicit time dependence and trajectory state evolution.

2. Training and Inference Procedures

The MeanFlow identity determines the network's supervision target during training: a neural predictor uθ\bm u_\theta is optimized to satisfy

LMF(θ)=Er,t,x,ϵuθ(zt,r,t)sg(tgt)2\mathcal{L}_\text{MF}(\theta) = \mathbb{E}_{r, t, \bm x, \bm\epsilon} \|\bm u_\theta(\bm z_t, r, t) - \text{sg}(tgt)\|^2

where

tgt=vt(tr)(vtzuθ+tuθ)tgt = \bm v_t - (t - r) (\bm v_t \partial_{\bm z}{\bm u_\theta} + \partial_t \bm u_\theta)

and sg()\text{sg}(\cdot) indicates a stop-gradient to avoid higher-order differentiation.

At inference, sample generation proceeds in a single evaluation using:

zr=zt(tr)uθ(zt,r,t)\bm z_r = \bm z_t - (t - r)\bm u_\theta(\bm z_t, r, t)

Most typically, (r,t)=(0,1)(r,t) = (0,1) so that

z0=ϵuθ(ϵ,0,1)\bm z_0 = \bm\epsilon - \bm u_\theta(\bm\epsilon, 0, 1)

yielding fast, direct mapping from prior to data space without iterative denoising.

3. Contrast with Instantaneous Velocity Methods

Traditional flow matching or diffusion models depend on accurately modeling instantaneous velocities and numerically integrating the corresponding ODE (sometimes tens or hundreds of steps):

Method Target Field Sampling Mechanism Inference Speed
Flow Matching (FM) v\bm v (instantaneous) ODE integration (multi-step) Slow
MeanFlow u\bm u (average) One-step flow map Fast

The MeanFlow identity structurally guarantees that the learned average velocity accumulates the same total transport as via sequential integration of instantaneous velocities, yielding minimal discretization error and quality compromise.

4. Implications for Multimodal Generative Tasks

In multimodal synthesis—for instance, video-to-audio generation—the identity empowers direct sample generation that preserves semantic and temporal alignment:

  • Efficiency: Orders-of-magnitude reduction in inference time (real-time factor RTF improved from 0.015 to 0.007 in VTA synthesis).
  • Quality: Maintains perceptual and temporal fidelity; empirical results confirm no significant compromise in alignment or synchronization.
  • Flexibility: Framework can be extended from one-step to multi-step inference, trading quality for speed as required.
  • Simplified Architecture: No need for auxiliary distillation or pretraining stages commonly used in previous acceleration approaches.

5. Theoretical Significance and Broader Context

The MeanFlow identity encodes a principled bridge between local and global trajectory statistics in continuous-time models. Notably:

  • When r=tr = t, the objective reduces to classic flow matching loss, highlighting MeanFlow as a proper superset.
  • The approach is differentiable and compatible with modern autodiff frameworks, permitting efficient computation of the necessary Jacobian-vector products for training.
  • The identity offers a mathematically grounded avenue for efficient generative modeling, in contrast to network-centric consistency constraints or shortcut models that only operate at the level of network outputs.

6. Key Equations Table

Component Equation Description
Instantaneous velocity v(zt,t)\bm v(\bm z_t, t) Used in FM (iterative)
Average velocity (MeanFlow) u(zt,r,t)=1trrtv(zτ,τ)dτ\bm u(\bm z_t, r, t) = \frac{1}{t - r} \int_{r}^{t} \bm v(\bm z_\tau, \tau) d\tau Interval mean velocity
MeanFlow identity u(zt,r,t)=v(zt,t)(tr)ddtu(zt,r,t)\bm u(\bm z_t, r, t) = \bm v(\bm z_t, t) - (t - r)\frac{d}{dt} \bm u(\bm z_t, r, t) Core differential relationship
Sampling update zr=zt(tr)u(zt,r,t)\bm z_r = \bm z_t - (t - r)\bm u(\bm z_t, r, t) Enables one-step generation

7. Impact and Prospective Directions

Deployment of the MeanFlow identity within multimodal video-to-audio synthesis (and wider generative domains) establishes a new standard for efficiency, scalability, and simplicity in generative architectures. Practical advantages include real-time generative capability for interactive media, dubbing, and accessibility solutions. The framework's abstraction over local velocity fields and compatibility with sequential or multimodal conditioning render it highly adaptable to future research directions in accelerated, high-fidelity generative modeling.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to MeanFlow Identity.