Papers
Topics
Authors
Recent
2000 character limit reached

Modular MeanFlow: Unified One-Step Modeling

Updated 26 November 2025
  • Modular MeanFlow (MMF) is a framework that efficiently generates high-quality data samples in one step via time-averaged velocity regression.
  • It introduces a tunable gradient modulation mechanism with a curriculum warmup to balance training stability and model expressiveness.
  • Empirical results show state-of-the-art performance in image synthesis, low-data regimes, and out-of-distribution scenarios.

Modular MeanFlow (MMF) is a unifying framework for stable and scalable one-step generative modeling, developed to efficiently generate high-quality data samples via direct mapping in a single function evaluation. MMF generalizes and interpolates between flow-matching and consistency-based models by introducing a principled family of regression losses built upon time-averaged velocity fields. Central to its design are a differential identity linking instantaneous and averaged velocities, a tunable gradient modulation mechanism, and a curriculum-style warmup schedule for training stability and expressiveness. Empirically, MMF achieves state-of-the-art performance across image synthesis, low-data, out-of-distribution (OOD), and trajectory modeling tasks, while circumventing the computational burden of higher-order derivatives (You et al., 24 Aug 2025).

1. Theoretical Framework

MMF builds on the continuous-time generative model defined by the ordinary differential equation (ODE):

dxtdt=v(xt,t),x1pprior,    x0pdata\frac{dx_t}{dt} = v(x_t, t), \qquad x_1 \sim p_\text{prior}, \;\; x_0 \sim p_\text{data}

where v(xt,t)v(x_t, t) denotes the instantaneous velocity field parameterizing the mapping from ppriorp_\text{prior} (usually a tractable distribution) to pdatap_\text{data}. MMF introduces the time-averaged velocity field over the interval [r,t][r, t]:

u(xt,r,t):=1trrtv(xτ,τ)dτu(x_t, r, t) := \frac{1}{t - r} \int_{r}^{t} v(x_\tau, \tau)\,d\tau

With Lipschitz assumptions on vv, the averaged velocity recovers the instantaneous field as trt \to r:

limtru(xt,r,t)=v(xr,r)\lim_{t \to r} u(x_t, r, t) = v(x_r, r)

A key identity underpins MMF:

v(xt,t)=u(xt,r,t)+(tr)ddtu(xt,r,t)v(x_t, t) = u(x_t, r, t) + (t - r) \frac{d}{dt} u(x_t, r, t)

where ddtu=tu+(xu)v(xt,t)\frac{d}{dt}u = \partial_t u + (\nabla_x u)\,v(x_t, t). This relation enables the regression of averaged velocities and their time derivatives to approximate the model's functional path, decoupling expressiveness from the risk of instability intrinsic to higher-order supervision.

2. Modular Loss Construction and Gradient Modulation

MMF defines a spectrum of regression losses parametrized both by velocity averaging interval and a scalar λ[0,1]\lambda \in [0,1] controlling gradient flow. The “full” regression loss is given by:

Lfull=Ex0,x1,r<tuθ(xt,r,t)+(tr)(tuθ+xuθuθ)x1x0tr2\mathcal{L}_\text{full} = \mathbb{E}_{x_0, x_1, r < t} \left\| u_\theta(x_t, r, t) + (t-r)\Big(\partial_t u_\theta + \nabla_x u_\theta \cdot u_\theta\Big) - \frac{x_1 - x_0}{t - r} \right\|^2

To trade off training stability and functional expressiveness, a partial stop-gradient operator is introduced:

SGλ[z]:=λz+(1λ)stopgrad(z)\mathrm{SG}_\lambda[z] := \lambda z + (1-\lambda)\,\mathrm{stopgrad}(z)

The MMF loss then generalizes as:

Lλ=Ex0,x1,r<tuθ(xt,r,t)+(tr)SGλ[tuθ+xuθx1x0tr]x1x0tr2\mathcal{L}_\lambda = \mathbb{E}_{x_0, x_1, r < t} \left\| u_\theta(x_t, r, t) + (t-r)\,\mathrm{SG}_\lambda\left[\partial_t u_\theta + \nabla_x u_\theta \cdot \frac{x_1 - x_0}{t-r}\right] - \frac{x_1 - x_0}{t - r} \right\|^2

Here, λ=1\lambda=1 fully propagates gradients (maximum expressiveness but possible instability), λ=0\lambda=0 detaches Jacobian-vector products (maximum stability), and intermediate λ\lambda governs a stability-expressiveness continuum. Explicitly blocking gradient flow through higher-order terms prevents gradient explosions and training oscillations.

3. Curriculum Warmup and Training Protocol

MMF adopts a curriculum for λ\lambda:

λ(ttrain)=min(1,ttrainTwarmup),Twarmup10%  of total steps\lambda(t_\mathrm{train}) = \min\left(1, \frac{t_\mathrm{train}}{T_\mathrm{warmup}}\right), \qquad T_\mathrm{warmup} \approx 10\%\;\text{of total steps}

In early training (λ0\lambda \approx 0), MMF behaves as a consistency or flow-matching model, yielding high stability by restricting second-order signal propagation. As λ1\lambda \to 1, expressive gradients are introduced, allowing the model to capture richer curvature and achieve lower asymptotic loss. Empirically, this curriculum schedule yields both rapid convergence and low variance in training.

A typical MMF training protocol:

1
2
3
4
5
6
7
8
9
10
11
12
13
Given: network u_θ, total steps N, warmup T_warmup
for step = 1 to N do
    sample x  p_data,  x  p_prior
    sample r[0,1), t(r,1]
    α  (t - r)/(1 - r)
    xₜ  (1-α) x + α x
    λ  min(1, step / T_warmup)
    # compute loss ℒ_λ using SG_λ on the JVP term
    L   u_θ(xₜ,r,t)
           + (t-r)·SG_λ[ₜu + ₓu·((x-x)/(t-r))]
           - (x - x)/(t-r) ²
    θ  θ - AdamStep(_θ L)
end for

Sampling proceeds in one step: x^0=x1uθ(x1,r=0,t=1)\hat{x}_0 = x_1 - u_\theta(x_1, r=0, t=1).

4. Connections to Prior Methods

MMF subsumes prior consistency and flow-matching models as special cases:

  • Consistency Models: Fixing (r,t)(0,1)(r, t) \equiv (0, 1) and λ=0\lambda=0 recovers the fixed-time consistency loss uθ(xt,0,1)(x1x0)2\|u_\theta(x_t, 0, 1) - (x_1 - x_0)\|^2.
  • Flow Matching: The instantaneous limit trt \to r, uθvθu_\theta \to v_\theta, and (tr)1(x1x0)v(xr,r)(t-r)^{-1}(x_1 - x_0) \to v(x_r, r) recovers the flow-matching loss vθ(xr,r)vtrue(xr,r)2\|v_\theta(x_r, r) - v_\mathrm{true}(x_r, r)\|^2.
  • Gradient Efficiency: Applying stop-gradient to the Jacobian-vector term ensures that backward computation never traverses 2uθ\nabla^2 u_\theta, eliminating O(d2)\mathcal{O}(d^2) cost and Hessian-vector product overhead.

This unification allows MMF to inherit the interpretability and theoretical properties of both frameworks, while providing a tunable control for interpolation between them.

5. Empirical Results and Model Analysis

MMF's empirical evaluation focuses on image synthesis, robustness, and path modeling:

Model FID (↓) 1-step MSE (↓) LPIPS (↓) Inference Time (s) (↓)
MeanFlow (full) 3.91 0.087 0.132 0.031
MeanFlow (stop-grad) 4.27 0.095 0.156 0.024
MMF (λ=0) 4.19 0.093 0.148 0.023
MMF (λ=0.5) 3.78 0.084 0.120 0.026
MMF (λ=1) 3.62 0.080 0.109 0.034
MMF (curriculum) 3.41 0.076 0.097 0.025

On CIFAR-10 and ImageNet-64, curriculum MMF achieves the lowest FID, lowest 1-step MSE, and the highest diversity (LPIPS), matching or exceeding the efficiency of prior mean flow and consistency baselines. Few-shot and OOD experiments demonstrate that curriculum MMF retains low FID even with as little as 1% of CIFAR-10 data and achieves 10–20% lower FID in OOD settings (SVHN, STL-10, CIFAR-C) compared to baselines. In ODE-fitting and 2D control tasks, curriculum MMF yields smooth, accurate paths, outperforming noisy full-gradient and oversmoothed stop-grad alternatives.

Path deviation is formalized as:

Dpath=E(sr)u(xs,r,s)+(ts)u(xt,s,t)(tr)u(xt,r,t)\mathcal{D}_\mathrm{path} = \mathbb{E}\big\| (s - r) u(x_s, r, s) + (t - s) u(x_t, s, t) - (t - r) u(x_t, r, t) \big\|

Curriculum MMF achieves the lowest Dpath\mathcal{D}_\mathrm{path}, supporting latent interpolation smoothness.

6. Ablations and Practicalities

Extensive ablations reveal that:

  • Varying λ\lambda: λ=0\lambda=0 yields maximum stability but higher FID (underfitting curvature). λ=1\lambda=1 is most expressive but unstable (loss oscillations). λ=0.5\lambda=0.5 provides some smoothing but with late-stage variance. Curriculum λ(t)\lambda(t) combines low early variance and best final performance.
  • Curriculum Horizon: Short warmup (small TwarmupT_\mathrm{warmup}) induces early instability; long warmup is too conservative with slower convergence. Optimal Twarmup10T_\mathrm{warmup} \approx 1015%15\% of total steps.
  • Compute: Forward-mode autodiff for the Jacobian-vector-product yields ~15% overhead; with stopgrad, no backward is needed through this term.

The standard MMF implementation utilizes a UNet architecture with sinusoidal time embeddings, Adam optimizer (learning rate 1×1041 \times 10^{-4}, batch size 128, cosine decay), and a curriculum warmup over 100k steps.

7. Significance, Limitations, and Outlook

MMF provides a theoretically grounded, computationally efficient, and practically robust approach for one-step generative modeling. By enabling a tunable spectrum between expressiveness and stability—mediated by gradient modulation and curriculum scheduling—it addresses the instability and inefficiency intrinsic to prior higher-order methods. MMF’s empirical results demonstrate high generalization under low-data and out-of-distribution regimes and applicability beyond image synthesis to trajectory modeling. A plausible implication is that MMF may be extensible to other domains requiring stable, one-shot sampling of complex data distributions via learnable ODE flows (You et al., 24 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Modular MeanFlow (MMF).