SplitMeanFlow Generative Modeling Framework
- SplitMeanFlow is a generative modeling framework that enforces algebraic interval splitting consistency, eliminating the need for expensive derivative calculations.
- It splits intervals into subparts using the additivity of definite integrals to simplify training and improve hardware compatibility.
- Empirical results in speech synthesis demonstrate similar quality to multi-step models while significantly reducing computational overhead.
SplitMeanFlow is a generative modeling framework based on the principle of enforcing algebraic consistency conditions for average velocity fields, rather than relying on differential or instantaneous velocity identities. Developed to address efficiency and stability challenges in few-step and one-step generative sampling—including applications in large-scale speech synthesis—SplitMeanFlow builds on recent advances in MeanFlow by introducing interval splitting consistency: an algebraic identity rooted in the additivity of definite integrals. This approach enables models to be trained without expensive derivative calculations and with broader hardware compatibility, providing significant computational advantages and empirical performance comparable to multi-step flow matching models.
1. Theoretical Foundations: Interval Splitting Consistency
SplitMeanFlow is centered on the definition of the average velocity field over an interval :
where is the instantaneous velocity field and describes the state at time along the flow.
The central theoretical innovation of SplitMeanFlow is the "Interval Splitting Consistency" identity, which derives from the additivity property of definite integrals. For any with ,
This relation asserts that the net displacement over is the sum of the displacements over and . Rearranged,
This algebraic, self-consistent property replaces the need for enforcing differential identities in model training.
Importantly, the paper demonstrates that the differential identity utilized in MeanFlow (Geng et al., 19 May 2025) is mathematically recovered as a limiting case where , establishing SplitMeanFlow as a direct generalization.
2. Comparison to MeanFlow and Previous Approaches
MeanFlow introduced the notion of average velocity in generative modeling and trained neural networks to satisfy a differential identity connecting instantaneous and average velocities. This required the computation of total derivatives,
typically implemented using Jacobian–vector products (JVPs), which can burden implementation and limit hardware compatibility.
SplitMeanFlow, by contrast, does not require differentiation. Instead, the training objective directly enforces the interval splitting consistency via forward evaluations of and , . This circumvents the need for higher-order gradients, leading to easier and more stable training. The relationship to MeanFlow is formalized as SplitMeanFlow's consistency reduces to the MeanFlow identity in the infinitesimal interval split limit.
3. Implementation Methodology
During training, the SplitMeanFlow framework samples triplets with and evaluates the network's prediction of on and the two subintervals and , at appropriately evolved states and .
A typical training step involves:
- Sampling initial state and evolving it along the flow to and .
- Computing , , (where is the parameterized average velocity field).
- Enforcing the interval splitting consistency via a loss such as:
- Backpropagation uses only standard first-order derivatives, simplifying codebase and improving hardware portability.
This methodology entirely removes the Jacobian–vector product calculation required in differential approaches, significantly reducing computational overhead and increasing training stability.
4. Computational Advantages and Deployment
Because the SplitMeanFlow objective uses only algebraic relationships and needs three forward passes per triplet for consistency enforcement, model training and inference are both computationally efficient.
Empirical evaluations have demonstrated:
- Successful deployment in large-scale speech synthesis (Seed-TTS and Doubao systems), retaining quality parity with ten-step Flow Matching baselines while reducing sampling evaluations by up to 20x.
- One-step and two-step models trained via SplitMeanFlow achieve speaker similarity, word error rate (WER), and human perceptual quality metrics nearly identical to strong multi-step models, implying that the algebraic consistency constraint is effective for matching the data manifold in few-step generative settings.
- More broadly, the algebraic approach translates to broader hardware compatibility, as it does not depend on deep autodiff support.
5. Practical Applications and Empirical Results
SplitMeanFlow has been validated in commercial-scale speech generation where fast sampling is essential. The approach is particularly relevant to any domain where generative models must operate under limited latency budgets, or where deploying on hardware accelerators with limited support for advanced differentiation is needed.
Key outcomes include:
Model | Steps (NFE) | Speaker Sim. | WER | Inference Speedup |
---|---|---|---|---|
Flow Matching | 10 | ≈baseline | ≈baseline | 1x |
SplitMeanFlow | 2 | ≈baseline | ≈baseline | 5–10x |
SplitMeanFlow | 1 | ≈baseline | ≈baseline | 10–20x |
This table summarizes key audio synthetic metrics and speed improvement based on Seed-TTS and Doubao results.
A plausible implication is that the SplitMeanFlow framework, due to its algebraic, model-agnostic nature, can generalize to image, video, and other sequence generative tasks where few-step or real-time inference is crucial.
6. Theoretical Generality and Future Directions
SplitMeanFlow highlights the power of algebraic, self-referential consistency relations for model supervision. By grounding the learning objective in first principles—specifically, the additivity of definite integrals for flow displacements—the framework avoids reliance on infinitesimal calculus, making it robust to both wide and narrow interval splitting (i.e., single or multiple steps).
Potential future research directions include:
- Generalizing interval splitting consistency objectives to alternative data manifolds, high-dimensional or multimodal generative tasks.
- Combining algebraic consistency with architectural innovations and other forms of guidance.
- Investigating broader implications for stability, generalization, and self-consistency in other classes of deep dynamical models.
- Exploring synergies with distillation and compressive techniques for further performance gains in very low-step settings.
7. Context within the Generative Modeling Landscape
SplitMeanFlow represents a shift towards more flexible and fundamentally grounded supervision principles in generative modeling. By subsuming differential formulations as limiting cases, it establishes a unifying algebraic framework. Its demonstrated speed and simplicity position it as a new baseline for future work in efficient, high-quality generative algorithms across modalities.