SoFlow: Direct Generative ODE Modeling
- SoFlow is a generative modeling framework that directly learns the closed-form solution of the velocity ODE underlying diffusion-based models, enabling efficient one- or few-step data generation.
- It utilizes a novel parameterization with complementary Flow Matching and Solution Consistency losses and leverages Diffusion Transformers in VAE latent space for superior performance.
- The method significantly reduces GPU memory usage and training time while achieving competitive FID scores compared to traditional multi-step diffusion and GAN approaches.
Solution Flow Models (SoFlow) constitute a generative modeling framework that directly learns the closed-form solution of the velocity ordinary differential equation (ODE) underlying diffusion-based models, enabling efficient one-step or few-step data sample generation. By explicitly modeling the mapping from a latent prior to data in a single network pass, SoFlow overcomes the inefficiency of traditional multi-step denoising approaches. The approach is characterized by a novel parameterization of the generative ODE’s solution, a pair of complementary loss functions—Flow Matching and Solution Consistency—and an architecture leveraging Diffusion Transformers (DiT) in VAE latent space, achieving state-of-the-art performance among one-step generative models on ImageNet 256×256 (Luo et al., 17 Dec 2025).
1. Mathematical Structure of the SoFlow Framework
SoFlow starts from a continuous interpolant (“noising process”) bridging data and a tractable prior : with , , , , . This yields a marginal velocity field
defining the generative ODE,
to be solved backward in time. Rather than numerically integrating this ODE, SoFlow directly learns its solution function: satisfying
for . Thus instantaneously maps any to in closed-form, fundamentally distinguishing SoFlow from velocity-based diffusion and flow-matching models (Luo et al., 17 Dec 2025).
2. Loss Function Design: Flow Matching and Solution Consistency
The parametric solution map is
where and are known (e.g., Euler or trigonometric parameterizations), is a neural network, and , . Flow Matching loss anchors the network’s instantaneous velocity: with an adaptive weight, sampled logit-normal. The correct instantaneous velocity is enforced via analytic differentiation of the solution map at .
Classifier-Free Guidance (CFG) is incorporated by interpolating class-conditional and unconditional velocity estimates during loss computation.
To guarantee correct solution mapping over finite intervals, SoFlow introduces a Solution Consistency loss: where is a stopped-gradient network copy and is an adaptive weight. The final objective is a weighted sum
with . Notably, involves no Jacobian-vector products, yielding superior training efficiency compared to flow-anchored objectives (Luo et al., 17 Dec 2025).
3. Model Architecture and Implementation Protocols
SoFlow adopts the Diffusion Transformer (DiT) backbone, operating in VAE latent space (32×32×4) for ImageNet 256×256 generation. Model variants include B/2 (131M), M/2 (308M), L/2 (459M), XL/2 (676M), with patch size 2×2 for B/2 upwards. Training from scratch uses batch size 256 and 240 epochs, AdamW optimizer (lr=, betas=(0.9,0.99)), no weight decay or lr decay, and EMA 0.9999.
Hyperparameters:
- Time sampling: logit-normal for Losses ( for , , for ).
- Noising schedule: linear () with Euler parameterization.
- CFG strength and velocity-mix tuned per model size; decays from 2.5/2.0 to 1.0 for large .
- CIFAR-10 experiments use a U-Net backbone, RAdam, batch size 1024, 800K steps, analogous settings (Luo et al., 17 Dec 2025).
4. Empirical Performance and Benchmarks
On ImageNet 256×256, SoFlow sets new FID-50K standards among one-step generative models across all tested DiT model scales:
| Method | Params | 1–NFE FID |
|---|---|---|
| MeanFlow B/2 | 131 M | 6.17 |
| SoFlow B/2 | 131 M | 4.85 |
| MeanFlow M/2 | 308 M | 5.01 |
| SoFlow M/2 | 308 M | 3.73 |
| MeanFlow L/2 | 459 M | 3.84 |
| SoFlow L/2 | 459 M | 3.20 |
| MeanFlow XL/2 | 676 M | 3.43 |
| SoFlow XL/2 | 676 M | 2.96 |
For two function evaluations, SoFlow XL/2 achieves 2.66 FID (vs. 2.93 for MeanFlow XL/2). These results are competitive with multi-step diffusion, autoregressive, and GAN methods at comparable or lower NFE, with SoFlow’s performance realized at significantly reduced inference cost. SoFlow consistently outperforms MeanFlow, the previous strongest one-step baseline (Luo et al., 17 Dec 2025).
5. Sampling, Inference Efficiency, and Practical Implications
SoFlow’s generative sampling proceeds as:
- Sample ;
- Produce via a single forward pass.
Optional few-step sampling is possible by perturbing and recursively invoking the solution map. The Solution Consistency loss eliminates JVPs, reducing peak GPU memory by 31% and enabling 23% faster training than MeanFlow on H100 GPUs. SoFlow inherits the computational efficiency of state-of-the-art attention kernels used in the DiT backbone (Luo et al., 17 Dec 2025).
6. Limitations, Extensions, and Forward-Looking Directions
SoFlow is currently best suited to scenarios demanding minimal NFE, with a tradeoff remaining at ultra-low NFE budgets compared to deep multi-step diffusion. Several extension directions are highlighted:
- Optimization of noising/interpolant schedules.
- Improved weighting schemes or variance reduction within the loss.
- Application to text-to-image, video, or hybrid few-step regimes.
- Empirical exploration of higher NFE hybrids (2–4 steps) for further FID improvements.
SoFlow’s formulation offers a platform for rapid progress in efficient generative modeling, with a unified framework that supports precise velocity field learning, CFG integration, and closed-form mapping from prior to data (Luo et al., 17 Dec 2025).
For the SoFlow “Semi-dilute-Flow” model relevant to polymer compression in Couette flow, see Dunstan (Dunstan, 2014). That distinct SoFlow framework models coil compression, not generative learning.