Modular MeanFlow: Unified One-Step Modeling
- Modular MeanFlow (MMF) is a framework that efficiently generates high-quality data samples in one step via time-averaged velocity regression.
- It introduces a tunable gradient modulation mechanism with a curriculum warmup to balance training stability and model expressiveness.
- Empirical results show state-of-the-art performance in image synthesis, low-data regimes, and out-of-distribution scenarios.
Modular MeanFlow (MMF) is a unifying framework for stable and scalable one-step generative modeling, developed to efficiently generate high-quality data samples via direct mapping in a single function evaluation. MMF generalizes and interpolates between flow-matching and consistency-based models by introducing a principled family of regression losses built upon time-averaged velocity fields. Central to its design are a differential identity linking instantaneous and averaged velocities, a tunable gradient modulation mechanism, and a curriculum-style warmup schedule for training stability and expressiveness. Empirically, MMF achieves state-of-the-art performance across image synthesis, low-data, out-of-distribution (OOD), and trajectory modeling tasks, while circumventing the computational burden of higher-order derivatives (You et al., 24 Aug 2025).
1. Theoretical Framework
MMF builds on the continuous-time generative model defined by the ordinary differential equation (ODE):
where denotes the instantaneous velocity field parameterizing the mapping from (usually a tractable distribution) to . MMF introduces the time-averaged velocity field over the interval :
With Lipschitz assumptions on , the averaged velocity recovers the instantaneous field as :
A key identity underpins MMF:
where . This relation enables the regression of averaged velocities and their time derivatives to approximate the model's functional path, decoupling expressiveness from the risk of instability intrinsic to higher-order supervision.
2. Modular Loss Construction and Gradient Modulation
MMF defines a spectrum of regression losses parametrized both by velocity averaging interval and a scalar controlling gradient flow. The “full” regression loss is given by:
To trade off training stability and functional expressiveness, a partial stop-gradient operator is introduced:
The MMF loss then generalizes as:
Here, fully propagates gradients (maximum expressiveness but possible instability), detaches Jacobian-vector products (maximum stability), and intermediate governs a stability-expressiveness continuum. Explicitly blocking gradient flow through higher-order terms prevents gradient explosions and training oscillations.
3. Curriculum Warmup and Training Protocol
MMF adopts a curriculum for :
In early training (), MMF behaves as a consistency or flow-matching model, yielding high stability by restricting second-order signal propagation. As , expressive gradients are introduced, allowing the model to capture richer curvature and achieve lower asymptotic loss. Empirically, this curriculum schedule yields both rapid convergence and low variance in training.
A typical MMF training protocol:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Given: network u_θ, total steps N, warmup T_warmup for step = 1 to N do sample x₀ ∼ p_data, x₁ ∼ p_prior sample r∈[0,1), t∈(r,1] α ← (t - r)/(1 - r) xₜ ← (1-α) x₀ + α x₁ λ ← min(1, step / T_warmup) # compute loss ℒ_λ using SG_λ on the JVP term L ← ‖ u_θ(xₜ,r,t) + (t-r)·SG_λ[∂ₜu + ∇ₓu·((x₁-x₀)/(t-r))] - (x₁ - x₀)/(t-r) ‖² θ ← θ - AdamStep(∇_θ L) end for |
Sampling proceeds in one step: .
4. Connections to Prior Methods
MMF subsumes prior consistency and flow-matching models as special cases:
- Consistency Models: Fixing and recovers the fixed-time consistency loss .
- Flow Matching: The instantaneous limit , , and recovers the flow-matching loss .
- Gradient Efficiency: Applying stop-gradient to the Jacobian-vector term ensures that backward computation never traverses , eliminating cost and Hessian-vector product overhead.
This unification allows MMF to inherit the interpretability and theoretical properties of both frameworks, while providing a tunable control for interpolation between them.
5. Empirical Results and Model Analysis
MMF's empirical evaluation focuses on image synthesis, robustness, and path modeling:
| Model | FID (↓) | 1-step MSE (↓) | LPIPS (↓) | Inference Time (s) (↓) |
|---|---|---|---|---|
| MeanFlow (full) | 3.91 | 0.087 | 0.132 | 0.031 |
| MeanFlow (stop-grad) | 4.27 | 0.095 | 0.156 | 0.024 |
| MMF (λ=0) | 4.19 | 0.093 | 0.148 | 0.023 |
| MMF (λ=0.5) | 3.78 | 0.084 | 0.120 | 0.026 |
| MMF (λ=1) | 3.62 | 0.080 | 0.109 | 0.034 |
| MMF (curriculum) | 3.41 | 0.076 | 0.097 | 0.025 |
On CIFAR-10 and ImageNet-64, curriculum MMF achieves the lowest FID, lowest 1-step MSE, and the highest diversity (LPIPS), matching or exceeding the efficiency of prior mean flow and consistency baselines. Few-shot and OOD experiments demonstrate that curriculum MMF retains low FID even with as little as 1% of CIFAR-10 data and achieves 10–20% lower FID in OOD settings (SVHN, STL-10, CIFAR-C) compared to baselines. In ODE-fitting and 2D control tasks, curriculum MMF yields smooth, accurate paths, outperforming noisy full-gradient and oversmoothed stop-grad alternatives.
Path deviation is formalized as:
Curriculum MMF achieves the lowest , supporting latent interpolation smoothness.
6. Ablations and Practicalities
Extensive ablations reveal that:
- Varying : yields maximum stability but higher FID (underfitting curvature). is most expressive but unstable (loss oscillations). provides some smoothing but with late-stage variance. Curriculum combines low early variance and best final performance.
- Curriculum Horizon: Short warmup (small ) induces early instability; long warmup is too conservative with slower convergence. Optimal – of total steps.
- Compute: Forward-mode autodiff for the Jacobian-vector-product yields ~15% overhead; with stopgrad, no backward is needed through this term.
The standard MMF implementation utilizes a UNet architecture with sinusoidal time embeddings, Adam optimizer (learning rate , batch size 128, cosine decay), and a curriculum warmup over 100k steps.
7. Significance, Limitations, and Outlook
MMF provides a theoretically grounded, computationally efficient, and practically robust approach for one-step generative modeling. By enabling a tunable spectrum between expressiveness and stability—mediated by gradient modulation and curriculum scheduling—it addresses the instability and inefficiency intrinsic to prior higher-order methods. MMF’s empirical results demonstrate high generalization under low-data and out-of-distribution regimes and applicability beyond image synthesis to trajectory modeling. A plausible implication is that MMF may be extensible to other domains requiring stable, one-shot sampling of complex data distributions via learnable ODE flows (You et al., 24 Aug 2025).