Papers
Topics
Authors
Recent
Search
2000 character limit reached

IntMeanFlow: Integral Velocity in Generative Models

Updated 5 February 2026
  • IntMeanFlow is a generative modeling approach that predicts time-averaged velocities using integral formulations instead of instantaneous values.
  • It leverages differential identities and gradient modulation to balance training stability and expressiveness for efficient one-step sampling.
  • Practical implementations in image synthesis, speech, and trajectory tasks demonstrate significant speed-ups and improved output quality.

IntMeanFlow refers to a class of methodologies and network architectures in generative modeling and flow-based learning that target the efficient prediction of time-averaged velocities or integral velocities, rather than directly estimating instantaneous velocities. This paradigm emerges from the need for fastforward sampling—generating high-quality outputs in a small number of steps (often one)—by leveraging relations between average and instantaneous velocity fields. Recent advances include the Modular MeanFlow (MMF) framework for image and trajectory modeling (You et al., 24 Aug 2025), the integral distillation approach for speech synthesis (Wang et al., 9 Oct 2025), and improved fastforward flows on large-scale datasets (Geng et al., 1 Dec 2025), each contributing unique algorithms and theoretical insights.

1. Mathematical Foundation and Differential Identities

IntMeanFlow formalizes the relationship between instantaneous velocity v(x,t)v(x, t) and the average (integral) velocity u(xt,r,t)u(x_t, r, t) over an interval [r,t][r, t]. The key differential identity is

v(xt,t)=u(xt,r,t)+(t−r)ddtu(xt,r,t),v(x_t, t) = u(x_t, r, t) + (t - r) \frac{d}{dt} u(x_t, r, t),

where

u(xt,r,t)=1t−r∫rtv(xτ,τ)dτ,ddtu=∂tu+(∇xu)v(xt,t)u(x_t, r, t) = \frac{1}{t - r} \int_r^t v(x_\tau, \tau) d\tau,\quad \frac{d}{dt}u = \partial_t u + (\nabla_x u)v(x_t, t)

(You et al., 24 Aug 2025). In practical generative tasks with available endpoints x0,x1x_0, x_1, linear imputation is performed using xt=(1−α)x0+αx1x_t = (1-\alpha)x_0 + \alpha x_1, with α=t−r1−r\alpha = \frac{t-r}{1-r}, and the average velocity is approximated as

u(xt,r,t)≈x1−x0t−r.u(x_t, r, t) \approx \frac{x_1 - x_0}{t - r}.

This identity underpins regression objectives that match predicted mean velocities against discretized targets, shaping efficient learning and one-step sampling regimes.

2. Loss Functions and Gradient Modulation

MeanFlow-inspired training objectives exploit the above differential relation. The full MMF loss is given by

Lfull=Ex0,x1,r<t∥uθ(xt,r,t)+(t−r)[∂tuθ(xt,r,t)+∇xuθ(xt,r,t)⋅uθ(xt,r,t)]−x1−x0t−r∥2,\mathcal{L}_{\rm full} = \mathbb{E}_{x_0, x_1, r < t}\left\| u_\theta(x_t, r, t) + (t-r)\left[\partial_t u_\theta(x_t, r, t) + \nabla_x u_\theta(x_t, r, t) \cdot u_\theta(x_t, r, t)\right] - \frac{x_1 - x_0}{t - r} \right\|^2,

with an approximate form employing stop-gradient on the second derivative term. A gradient modulation mechanism interpolates between full backpropagation (expressiveness, instability) and stop-gradient (stability, reduced capacity) via

SGλ[z]=λz+(1−λ)stopgrad(z)∀  z,\mathrm{SG}_\lambda[z] = \lambda z + (1-\lambda) \mathrm{stopgrad}(z)\quad \forall\; z,

where λ∈[0,1]\lambda \in [0, 1] enables annealing during training, typically in a curriculum schedule from λ=0\lambda = 0 to λ=1\lambda = 1 over a warmup period TwarmupT_{\rm warmup} (You et al., 24 Aug 2025). This yields robust and expressive training curves across data regimes.

3. Integral Velocity Distillation and Practical Algorithmic Realizations

The speech generation variant of IntMeanFlow (Wang et al., 9 Oct 2025) circumvents the computational overhead of Jacobian–vector products (JVP) and self-bootstrap instability apparent in earlier MeanFlow models. The procedure consists of:

  • Training a teacher flow-matching model with high NFE.
  • Distilling average velocity over intervals [t,r][t, r] by rolling out the teacher in discrete steps and setting

v‾teacher(zt;t,r)=zr−ztr−t≈1r−t∫trv(zτ,τ) dτ,\overline{v}_{\rm teacher}(z_t; t, r) = \frac{z_r - z_t}{r - t} \approx \frac{1}{r-t} \int_t^r v(z_\tau, \tau)\, d\tau,

  • Training the student to predict these targets via Ldistill=Et,r[∥ustudent(zt;t,r)−v‾teacher(zt;t,r)∥2]L_{\rm distill} = \mathbb{E}_{t,r} [\|u_{\rm student}(z_t; t, r) - \overline{v}_{\rm teacher}(z_t; t, r)\|^2].

The method, relying only on outer-loop backpropagation, supports large batch sizes, reduces memory consumption, and fosters stability—contrasted with classical MeanFlow approaches. The introduced Optimal Step Sampling Search (O3S) algorithm employs coordinate-wise ternary search on sampling schedules to further optimize inference quality without runtime overhead (Wang et al., 9 Oct 2025).

4. Unification of Generative Paradigms

IntMeanFlow subsumes multiple classes of generative objectives within a parameter-continuous family. Setting λ=0\lambda=0 and t=1t=1 retrieves first-order consistency-model loss, eliminating JVP computation. Allowing r→tr \to t and λ=1\lambda = 1 recovers instantaneous flow-matching,

L(θ)=E[∥vθ(x,t)−f(x,t)∥2].L(\theta) = \mathbb{E}\left[\|v_\theta(x, t) - f(x, t)\|^2\right].

Intermediate settings interpolate between full MeanFlow, StopGrad MeanFlow, Consistency Models, and standard flow-matching. This unification is supported by empirical results and tabulated method comparisons (You et al., 24 Aug 2025):

Method Loss Type JVP Required? Stop-Grad
Full MeanFlow 2nd-order Yes No
Consistency Model 1st-order No Yes
StopGrad MeanFlow Approx No Yes
MMF (curriculum) Tunable Optional Partial

5. Empirical Results and Application Domains

Image synthesis and trajectory modeling: Curriculum-scheduled MMF (IntMeanFlow) achieves lowest FID (e.g., FID = 3.41 on CIFAR-10 with 1 NFE), superior 1-MSE and LPIPS, and reduced inference time versus full-gradient or stop-gradient variants (You et al., 24 Aug 2025). Few-shot settings show curriculum MMF substantially improves sample quality and OOD generalization (8–20% FID reduction).

Speech synthesis: In token-to-spectrogram and text-to-spectrogram TTS, IntMeanFlow reaches near-teacher WER, speaker similarity, and UTMOS with 1–3 NFE, achieving ∼\sim10–20× speed-ups and substantially lower resource requirements compared to MeanFlow (Wang et al., 9 Oct 2025). O3S optimizes step placement for quality under fixed NFE.

Fastforward generative modeling: Improved MeanFlow (iMF) further refines the objective and guidance mechanism by recasting the loss on vv with an average velocity predictor uu, enabling stable one-step sampling. iMF attains FID = 1.72 on ImageNet 256×256 with 1 NFE, surpassing prior MeanFlow and closing the gap to multi-step diffusion samplers—using no distillation (Geng et al., 1 Dec 2025).

6. Limitations and Future Directions

While IntMeanFlow provides efficient and accurate one-step or few-step generation, limitations remain. At very low NFE, slight degradations in output fidelity and fluency may occur. Optimization algorithms such as O3S require additional dev set runs. Prospective directions include:

  • Extending distillation and velocity learning to multi-speaker, prosody-control, and multimodal tasks.
  • Joint end-to-end learning of sampling schedules and velocity maps.
  • Adapting integral-velocity learning to video and audio generative domains.
  • Improving teacher guidance signals for further one-step fidelity gains.

A plausible implication is that the general principle of integral velocity regression offers a scalable pathway for fastforward generation in broad data modalities, provided that teacher reference trajectories can be efficiently obtained and the interval-based parameterization is expressive (Wang et al., 9 Oct 2025).

IntMeanFlow and MeanFlow variants derive their theoretical foundation from classical mean velocity equations in statistical hydrodynamics (Piest, 2013) and geometric flows such as IMCF (Cui et al., 2023). The progression from turbulence closure and geometric PDEs to modern flow-matching and time-averaged velocity learning reflects the increasing emphasis on integrating analytic identities, functional approximation, and computational tractability in generative modeling. IntMeanFlow crystallizes these themes, offering a comprehensive framework for unifying and accelerating flow-based sampling philosophies.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to IntMeanFlow.