Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 97 tok/s Pro
Kimi K2 176 tok/s Pro
GPT OSS 120B 432 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Rectified Flow Matching

Updated 19 October 2025
  • Rectified Flow Matching is a generative modeling framework that constructs near-linear ODE paths between distributions using learned velocity fields.
  • It employs self-conditioning and reflow training strategies to refine trajectories, dramatically reducing sampling steps while preserving sample quality.
  • Applications span from speech synthesis and image generation to fluid dynamics, with extensions addressing hierarchical, variational, and latent-constrained modeling challenges.

Rectified Flow Matching (RFM) is a paradigm in generative modeling that reformulates the transport of probability mass between distributions as the integration of an ordinary differential equation (ODE) along deterministic, typically straight trajectories induced by a learned velocity (vector) field. Originally developed as a solution to the inefficiency and sampling curvature inherent in diffusion and flow-matching models, RFM has been widely adopted in speech synthesis, image generation, video-to-audio, audio editing, fluid dynamics, and other scientific and creative domains. It is characterized by the explicit construction of linear or near-linear paths between a simple source (e.g., Gaussian noise) and complex target distributions, with models trained to match either the ground-truth or self-refined velocity fields conditional on context.

1. Mathematical Formulation and Core Principles

At the heart of RFM lies a time-dependent vector field vθ(x,t)v_\theta(x, t) parameterized by a neural network, and the forward process is described by the ODE

dxtdt=vθ(xt,t),\frac{dx_t}{dt} = v_\theta(x_t, t),

with %%%%1%%%% (source, e.g., noise) and the process aimed to map x0x_0 to x1p1x_1\sim p_1 (target distribution). The most canonical instantiation uses straight-line "rectified" interpolation: xt=(1t)x0+tx1,v(xt,t)=x1x0.x_t = (1-t) x_0 + t x_1, \qquad v(x_t, t) = x_1 - x_0. The model is trained to minimize the mean squared error (MSE) between the predicted velocity and this target: LRFM(θ)=Et,x0,x1,xt[vθ(xt,t)(x1x0)2].\mathcal{L}_\text{RFM}(\theta) = \mathbb{E}_{t, x_0, x_1, x_t}\left[\left\|v_\theta(x_t, t) - (x_1 - x_0)\right\|^2\right]. At inference, starting from x0p0x_0\sim p_0, one numerically solves the ODE (e.g., by Euler updates), typically requiring far fewer integration steps than conventional diffusion-based approaches.

A distinctive enhancement is the "reflow"/rectification step: after initial training, pairs are generated from the model itself (x0,x^1)(x_0', \hat{x}_1) (with x^1\hat{x}_1 a generated endpoint), and the velocity field is retrained to align these self-generated pairs, thus "straightening" learned trajectories and increasing sampling efficiency.

2. Training Strategies, Self-Conditioning, and Refinements

Reflow and Self-Conditioning

Vanilla flow matching often leads to curvy trajectories because of random (independently sampled) pairs during training. The reflow step, first formalized in speech synthesis (e.g., VoiceFlow (Guo et al., 2023)), uses the model's own generative path to produce new pairings. The model is then retrained to match (x0,x^1)(x_0', \hat{x}_1), as in

minθEt,x0,x^1,xt[vθ(xt,t)(x^1x0)2],\min_\theta \mathbb{E}_{t, x_0', \hat{x}_1, x_t}\left[\left\|v_\theta(x_t, t) - (\hat{x}_1 - x_0')\right\|^2\right],

where x0x_0' is a fresh sample from p0p_0 and x^1\hat{x}_1 is the endpoint after integrating the model's ODE starting at x0x_0'. This iterative self-conditioning pushes ODE trajectories to be increasingly straight and aligned with the optimal transport between p0p_0 and p1p_1.

Closed-Form Vector Fields and Explicit Flow Matching

Explicit Flow Matching (ExFM) (Ryzhakov et al., 5 Feb 2024) frames RFM as loss rectification by integrating over the conditional distribution, producing analytic expressions for the optimal field. This reduces gradient variance and speeds convergence: v(x,t)=w(t,x1,x)ρc(xx1,t)dx1v(x, t) = \int w(t, x_1, x) \rho_c(x \mid x_1, t) dx_1 where ww is a deterministic function and ρc\rho_c is the conditional density. This formalization clarifies and justifies the rectification process and is especially tractable in Gaussian or Gaussian-mixture settings.

Hierarchical and Variational Extensions

Recent extensions include hierarchical rectified flow frameworks (Zhang et al., 17 Jul 2025), in which a hierarchy of ODEs models not just positions but also higher-order dynamics (e.g., acceleration, "velocity of velocity"), and variational methods (Guo et al., 13 Feb 2025), which capture multi-modality in the velocity field by introducing latent variables zz: p(vxt,t,z)=N(v;vθ(xt,t,z),I),zp(z).p(v \mid x_t, t, z) = \mathcal{N}(v ; v_\theta(x_t, t, z), I), \quad z \sim p(z). The lower bound combines reconstruction and KL divergence terms, enabling the model to move beyond mean-field assumptions inherent in classic RFM.

3. Relation to Diffusion Models and Efficiency Gains

RFM is intimately related to the probability flow ODE underlying diffusion models. In standard diffusion, the generative trajectory is curved and noisy, requiring hundreds of SDE or ODE integration steps. Rectified flow matching (and its generalization, rectified diffusion (Wang et al., 9 Oct 2024)) leverages deterministic noise–sample pairs (from a pretrained model or via self-consistent training) and focuses exclusively on first-order ODE paths. The crucial insight from (Wang et al., 9 Oct 2024) is that a straight path per se is not mandatory, provided that the path is a first-order ODE approximation with consistent predictions along the trajectory; in many popular parametrizations (e.g., DDPM or Sub-VP), the ODE path is inherently curved, but can be straightened via a change of variables.

This realization unifies rectified flows and first-order consistency models, and greatly simplifies training and distillation pipelines (e.g., as in AudioTurbo (Zhao et al., 28 May 2025) and industry-scale FGM (Huang et al., 25 Oct 2024)), enabling extremely fast inference—often down to one or a few steps—while maintaining or surpassing the quality of multi-step baseline models.

4. Theoretical Properties, Optimal Transport, and Invariance

RFM admits several key theoretical properties:

  • Affine and additive invariances: The learned velocity field transforms predictably under affine transformations of the data, translations, and scaling (Hertrich et al., 26 May 2025). These invariances echo those of optimal transport velocity fields (Benamou–Brenier).
  • Explicit construction in special cases: For (joint) Gaussian and mixture distributions, explicit closed-form solutions exist for the optimal velocity, and in the independent Gaussian case, RFM already yields the unique OT map in the first rectification.
  • Caveats regarding optimal transport: While gradient-constrained RFM (enforcing vt=ϕtv_t = \nabla\phi_t) could, under restrictive assumptions, yield OT maps, in general, this is not guaranteed without connected supports and regularity; counterexamples demonstrate that even zero-loss, gradient velocity fields may fail to be optimal (Hertrich et al., 26 May 2025).

5. Practical Applications and Empirical Results

Domain Key Model(s) Notable RFM Benefit Sampling Steps Key Metric(s)
Text-to-Speech VoiceFlow, SlimSpeech Low-step, high MOS/MCD 1 - 10 MOS, MOSnet, MCD
Video-to-Audio Frieren High alignment, fast gen. 1 - 25 Inception, alignment
Sound Separation FlowSep SOTA quality, efficient ≤ 10 FAD, CLAPScore
Audio Editing RFM-Editing Precise, robust editing 10 - 30 CLAP, FD, KL
Image Synthesis Rectified Diff., FGM 1-step gen., high FID 1 FID, GenEval
Fluid Modeling ReFlow Multiscale, fast sampling 8 - 10 Wasserstein, L2L^2
Motion Gen. (Text2H) MotionFLUX Real-time, aligned motions 1 - few FID, R-Precision

Empirical studies (Guo et al., 2023, Wang et al., 1 Jun 2024, Huang et al., 25 Oct 2024, Samaddar et al., 7 May 2025, Armegioiu et al., 3 Jun 2025) consistently demonstrate the following:

  • Inference acceleration: RFM methods often reduce required inference steps by 10×–100× relative to vanilla diffusion or flow models.
  • Sample quality robustness: MOS/MCD in TTS, FID in images, and other quantitative metrics show minimal quality degradation at low step counts.
  • Task adaptability: RFM-based frameworks are instantiated across text, image, audio, video, motion, and scientific modeling, often with domain-specific architectural enhancements (e.g., transformer-based cross-modal fusion in Frieren (Wang et al., 1 Jun 2024), spatial constraints in TumorGen (Liu et al., 30 May 2025), variational decoders in FlowSep (Yuan et al., 11 Sep 2024)).

6. Contemporary Extensions and Open Challenges

Recent works push the boundaries of RFM:

  • Latent-constrained and manifold-aware flows: Latent-CFM (Samaddar et al., 7 May 2025) incorporates pretrained deep latent variable models to condition the transport path, yielding improved efficiency and generation quality on multi-modal or physically-constrained data.
  • Momentum flows for diversity: Discretized-RF (Ma et al., 10 Jun 2025) introduces stochasticity into sub-path velocity fields (momentum fields), tackling the diversity and multi-scale modeling limitations inherent to pure straight-line dynamics.
  • Hierarchical and mini-batch coupling: Hierarchical RFM (Zhang et al., 17 Jul 2025) models velocity (and higher-order) distributions at multiple levels, with mini-batch optimal transport couplings used to gradually reduce distribution complexity across hierarchy levels, thus improving efficiency and supporting multi-modality.
  • Infinite-dimensional, functional generative models: Functional RFM (Zhang et al., 12 Sep 2025) extends the entire framework to separable infinite-dimensional Hilbert spaces, with rigorous construction via the superposition principle for continuity equations, and demonstrates state-of-the-art results in image and PDE data modeling.

Open challenges include the principled handling of multi-modal velocity fields without collapsing to mean directions (addressed, e.g., by variational approaches (Guo et al., 13 Feb 2025)), efficient simulation-free coupling in hierarchical models, and reconciling gradient-constrained RFM with broader classes of optimal transports under relaxed assumptions.

7. Summary and Outlook

Rectified Flow Matching constitutes a versatile, mathematically principled, and empirically validated approach for constructing efficient generative models across a diverse array of modalities and problem domains. Its hallmark is the construction and refinement of nearly straight (first-order) ODE trajectories between source and target distributions, enabling rapid sampling by large integration steps and decoupling convergence from the limitations of simulation-based diffusion frameworks. Recent work continues to generalize and sophisticate the RFM paradigm, embedding latent structure, supporting greater diversity, and achieving profound impact in both scientific and creative domains.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Rectified Flow Matching.