SplitMeanFlow Generative Modeling Framework

Updated 25 July 2025

SplitMeanFlow is a generative modeling framework that enforces algebraic interval splitting consistency, eliminating the need for expensive derivative calculations.
It splits intervals into subparts using the additivity of definite integrals to simplify training and improve hardware compatibility.
Empirical results in speech synthesis demonstrate similar quality to multi-step models while significantly reducing computational overhead.

SplitMeanFlow is a generative modeling framework based on the principle of enforcing algebraic consistency conditions for average velocity fields, rather than relying on differential or instantaneous velocity identities. Developed to address efficiency and stability challenges in few-step and one-step generative sampling—including applications in large-scale speech synthesis—SplitMeanFlow builds on recent advances in MeanFlow by introducing interval splitting consistency: an algebraic identity rooted in the additivity of definite integrals. This approach enables models to be trained without expensive derivative calculations and with broader hardware compatibility, providing significant computational advantages and empirical performance comparable to multi-step flow matching models.

1. Theoretical Foundations: Interval Splitting Consistency

SplitMeanFlow is centered on the definition of the average velocity field $u(z_t, r, t)$ over an interval $[r, t]$ :

$u(z_t, r, t) = \frac{1}{t - r} \int_r^t v(z_{\tau}, \tau) d\tau,$

where $v(\cdot, \cdot)$ is the instantaneous velocity field and $z_{\tau}$ describes the state at time $\tau$ along the flow.

The central theoretical innovation of SplitMeanFlow is the "Interval Splitting Consistency" identity, which derives from the additivity property of definite integrals. For any $s$ with $r < s < t$ ,

$(t - r) \cdot u(z_t, r, t) = (s - r) \cdot u(z_s, r, s) + (t - s) \cdot u(z_t, s, t).$

This relation asserts that the net displacement over $[r, t]$ is the sum of the displacements over $[r, s]$ and $[s, t]$ . Rearranged,

$u(z_t, r, t) = (1 - \lambda) u(z_s, r, s) + \lambda u(z_t, s, t), \quad \text{where } \lambda = \frac{t - s}{t - r}.$

This algebraic, self-consistent property replaces the need for enforcing differential identities in model training.

Importantly, the paper demonstrates that the differential identity utilized in MeanFlow (Geng et al., 19 May 2025) is mathematically recovered as a limiting case where $s \to t$ , establishing SplitMeanFlow as a direct generalization.

2. Comparison to MeanFlow and Previous Approaches

MeanFlow introduced the notion of average velocity in generative modeling and trained neural networks to satisfy a differential identity connecting instantaneous and average velocities. This required the computation of total derivatives,

$u(z_t, r, t) = v(z_t, t) - (t - r) \left[ v(z_t, t) \cdot \nabla_z u + \partial_t u \right],$

typically implemented using Jacobian–vector products (JVPs), which can burden implementation and limit hardware compatibility.

SplitMeanFlow, by contrast, does not require differentiation. Instead, the training objective directly enforces the interval splitting consistency via forward evaluations of $u(z_t, r, t)$ and $u(z_s, r, s)$ , $u(z_t, s, t)$ . This circumvents the need for higher-order gradients, leading to easier and more stable training. The relationship to MeanFlow is formalized as SplitMeanFlow's consistency reduces to the MeanFlow identity in the infinitesimal interval split limit.

3. Implementation Methodology

During training, the SplitMeanFlow framework samples triplets $(r, s, t)$ with $r < s < t$ and evaluates the network's prediction of $u$ on $[r, t]$ and the two subintervals $[r, s]$ and $[s, t]$ , at appropriately evolved states $z_s$ and $z_t$ .

A typical training step involves:

Sampling initial state $z_r$ and evolving it along the flow to $z_s$ and $z_t$ .
Computing $u_\theta(z_t, r, t)$ , $u_\theta(z_s, r, s)$ , $u_\theta(z_t, s, t)$ (where $u_\theta$ is the parameterized average velocity field).
Enforcing the interval splitting consistency via a loss such as:

$\mathcal{L} = \left\| (t - r) u_\theta(z_t, r, t) - (s - r) u_\theta(z_s, r, s) - (t - s) u_\theta(z_t, s, t) \right\|^2.$

Backpropagation uses only standard first-order derivatives, simplifying codebase and improving hardware portability.

This methodology entirely removes the Jacobian–vector product calculation required in differential approaches, significantly reducing computational overhead and increasing training stability.

4. Computational Advantages and Deployment

Because the SplitMeanFlow objective uses only algebraic relationships and needs three forward passes per triplet for consistency enforcement, model training and inference are both computationally efficient.

Empirical evaluations have demonstrated:

Successful deployment in large-scale speech synthesis (Seed-TTS and Doubao systems), retaining quality parity with ten-step Flow Matching baselines while reducing sampling evaluations by up to 20x.
One-step and two-step models trained via SplitMeanFlow achieve speaker similarity, word error rate (WER), and human perceptual quality metrics nearly identical to strong multi-step models, implying that the algebraic consistency constraint is effective for matching the data manifold in few-step generative settings.
More broadly, the algebraic approach translates to broader hardware compatibility, as it does not depend on deep autodiff support.

5. Practical Applications and Empirical Results

SplitMeanFlow has been validated in commercial-scale speech generation where fast sampling is essential. The approach is particularly relevant to any domain where generative models must operate under limited latency budgets, or where deploying on hardware accelerators with limited support for advanced differentiation is needed.

Key outcomes include:

Model	Steps (NFE)	Speaker Sim.	WER	Inference Speedup
Flow Matching	10	≈baseline	≈baseline	1x
SplitMeanFlow	2	≈baseline	≈baseline	5–10x
SplitMeanFlow	1	≈baseline	≈baseline	10–20x

This table summarizes key audio synthetic metrics and speed improvement based on Seed-TTS and Doubao results.

A plausible implication is that the SplitMeanFlow framework, due to its algebraic, model-agnostic nature, can generalize to image, video, and other sequence generative tasks where few-step or real-time inference is crucial.

6. Theoretical Generality and Future Directions

SplitMeanFlow highlights the power of algebraic, self-referential consistency relations for model supervision. By grounding the learning objective in first principles—specifically, the additivity of definite integrals for flow displacements—the framework avoids reliance on infinitesimal calculus, making it robust to both wide and narrow interval splitting (i.e., single or multiple steps).

Potential future research directions include:

Generalizing interval splitting consistency objectives to alternative data manifolds, high-dimensional or multimodal generative tasks.
Combining algebraic consistency with architectural innovations and other forms of guidance.
Investigating broader implications for stability, generalization, and self-consistency in other classes of deep dynamical models.
Exploring synergies with distillation and compressive techniques for further performance gains in very low-step settings.

7. Context within the Generative Modeling Landscape

SplitMeanFlow represents a shift towards more flexible and fundamentally grounded supervision principles in generative modeling. By subsuming differential formulations as limiting cases, it establishes a unifying algebraic framework. Its demonstrated speed and simplicity position it as a new baseline for future work in efficient, high-quality generative algorithms across modalities.

PDF Markdown Chat (Pro)

References (1)

Mean Flows for One-step Generative Modeling (2025)

Follow Topic

Get notified by email when new papers are published related to SplitMeanFlow.