Rectified Flow Models in Generative Modeling

Updated 23 January 2026

Rectified flow models are generative models that learn a time-dependent velocity field to transform a simple base distribution into a target data distribution via near-straight ODE trajectories.
They employ a regression approach to approximate data displacements, enabling high-quality synthesis with drastically fewer sampling steps compared to stochastic diffusion methods.
Advancements like recursive rectification and architecture-specific adaptations improve model fidelity and broaden applications across images, audio, text, and fluid simulations.

Rectified flow models are generative models that transport a simple source distribution (e.g., standard Gaussian) to a target data distribution (such as images, speech, or discrete text) by learning an ordinary differential equation (ODE) whose trajectories are as “straight” as possible in sample space. Unlike score-based diffusion models, which involve stochastic differential equations with hundreds of sampling steps, rectified flow learns a time-dependent velocity field regressed to straight-line displacements, enabling high-quality generation with drastically fewer steps. The rectified flow paradigm has seen rapid expansion and innovation, spanning continuous, discrete, hierarchical, and multi-modal domains, and has initiated new directions in optimal transport, model distillation, editing, and plug-and-play applications.

1. Mathematical Foundation and Core Objective

The core objective of rectified flow is to learn a velocity field $v_\theta(x, t)$ that transforms a base distribution $\pi_0$ (often $\mathcal{N}(0, I)$ ) into a target distribution $\pi_1$ (e.g., images, audio). For paired samples $(x_0, x_1) \sim \pi_0 \times \pi_1$ , the straight-line interpolation is: $x_t = (1-t)x_0 + t x_1, \quad t \in [0, 1].$ The model regresses $v_\theta$ to approximate the displacement direction $(x_1 - x_0)$ at each $x_t$ : $\min_\theta \int_0^1 \mathbb{E}_{x_0, x_1} \left\| (x_1 - x_0) - v_\theta(x_t, t) \right\|^2 dt.$ The generative ODE is: $\pi_0$ 0 If $\pi_0$ 1 matches $\pi_0$ 2 everywhere, trajectories between $\pi_0$ 3 and $\pi_0$ 4 traced by the ODE are straight, and the final distribution of $\pi_0$ 5 matches $\pi_0$ 6 exactly (Liu et al., 2022, Guan et al., 2023).

Distinct from score-based diffusion, which models the (usually non-linear, stochastic) score field $\pi_0$ 7, rectified flow directly fits the conditional expectation of the data displacement, resulting in deterministic, straight or nearly-straight paths.

2. Recursive Rectification and Trajectory Straightening

Early rectified flow models observed that a single regression (1-RF) typically yields trajectories that are close to straight but not exact. By recursively re-coupling synthetic pairs generated from the current flow and retraining (the "reflow" step), each successive rectification (2-RF, 3-RF, etc.) yields increasingly straight ODE paths:

After each reflow step, the conditional expectation velocity is updated using pairs from the previous flow's samples.
Theoretical results prove a monotonic decrease in convex transport costs (e.g., Wasserstein distance) and decreasing "straightness" error with each recursion (Liu et al., 2022, Bansal et al., 2024).
Empirical studies confirm that after one or two rectified steps, even a single-step Euler integration often suffices for high-fidelity generation (e.g., image FID, TTS MOS) (Guan et al., 2023, Liu et al., 2022).

The recursive process:

$\pi_0$ 8 rectify $\pi_0$ 9
$\mathcal{N}(0, I)$ 0 regressed as above on the new pairs

This procedure underlies one-step distillation and efficient few-step models (Zhu et al., 2024).

3. Algorithmic and Architectural Details

The parameterization of $\mathcal{N}(0, I)$ 1 varies by domain:

Images/audio: U-Nets, convolutional backbones, or diffusion-transformers (DiT), with time encoded via sinusoidal embeddings (Ma et al., 12 Mar 2025, Braunstein et al., 2024).
Text/layouts: Token-based transformers, often utilizing object/meta-embeddings and prompt conditioning for text-to-layout (Braunstein et al., 2024).
Fluid/PDEs: Velocity networks, sometimes with FiLM conditioning on initial/boundary data and multi-head attention (Armegioiu et al., 3 Jun 2025).
Discrete data: Rectified discrete flows operate by recoupling discrete variables and leveraging conditional total correlation as an error metric (Yoo et al., 21 Jul 2025).

The typical training pseudocode:

Sample $\mathcal{N}(0, I)$ 2, $\mathcal{N}(0, I)$ 3.
Compute $\mathcal{N}(0, I)$ 4.
Output $\mathcal{N}(0, I)$ 5; minimize $\mathcal{N}(0, I)$ 6.
Update $\mathcal{N}(0, I)$ 7.

For conditional generation (e.g., text-to-image, TTS), $\mathcal{N}(0, I)$ 8 receives additional conditioning inputs (CLIP, FastSpeech, language embeddings) (Guan et al., 2023, Ma et al., 12 Mar 2025).

4. Theoretical Guarantees, Optimal Transport, and Extensions

Rectified flow uniquely preserves marginals at every interpolation time, and, unlike extrinsic penalty-based OT solvers, maintains valid coupling throughout all recursive steps:

Marginal preservation: The ODE-sampled points have the correct time-marginals by construction, irrespective of neural network error (Liu, 2022).
Cost monotonicity: Any convex transport cost strictly decreases with each rectification; the iterative procedure converges to the optimal transport plan for $\mathcal{N}(0, I)$ 9 under mild regularity.
Strong straightness: Successive rectification yields couplings where increments $\pi_1$ 0 are predictable functions of the chord $\pi_1$ 1, a stronger criterion than average curvature (Bansal et al., 2024).
Extensions: Hierarchical formulations model higher derivatives (velocity, acceleration), permitting intersecting ODE paths, further reducing trajectory curvature and number of function evaluations (Zhang et al., 24 Feb 2025).
Balanced/Conic flows: Recent approaches integrate real samples and geometric constraints (slerp-cones) to prevent error accumulation, sampling drift, or model collapse when relying heavily on synthetic pairs during recursive reflow (Seong et al., 29 Oct 2025, Zhu et al., 2024).

5. Empirical Applications and Benchmarks

Rectified flow models demonstrate state-of-the-art or near-SOTA performance in a wide array of generative modeling tasks:

Image Generation: Euler or RK solvers with as few as 1–8 steps suffice for high-fidelity image synthesis (e.g., NAMI, Flux, InstaFlow) (Ma et al., 12 Mar 2025, Liu et al., 2022, Yang et al., 2024). On CIFAR-10, one-step flows achieve FID $\pi_1$ 25.1 at $\pi_1$ 3M parameters (Zhu et al., 2024). Progressive and multiresolution architectures (as in NAMI) further accelerate inference (Ma et al., 12 Mar 2025).
Text-to-Speech: ReFlow-TTS matches or surpasses diffusion-based models with MOS $\pi_1$ 44.5 in $\pi_1$ 5 of the sampling time (Guan et al., 2023).
Fluid Simulation: Rectified flows for multiscale PDEs recover fine statistical/moment properties, require only 4–8 inference steps, and outperform or match SDE-based conditional flows (Armegioiu et al., 3 Jun 2025).
Discrete Data: Rectified discrete flows (ReDi) guarantee monotonic reduction in conditional total correlation of the modeled coupling, enabling one-step or few-step discrete data synthesis with large IS/FID improvements (Yoo et al., 21 Jul 2025).
Layout and Multimodal Generation: SLayR and JanusFlow prove that transformer-based rectified flows can be embedded into LLMs for layout or joint understanding/generation with competitive quality, parameter, and speed tradeoffs (Braunstein et al., 2024, Ma et al., 2024).
Editing and Control: Semantic attribute disentanglement in FluxSpace demonstrates editable latent representations within rectified flow transformers (Dalva et al., 2024). Plug-and-play prior, zero-shot editing, and inversion are enabled by deterministic, time-symmetric ODE properties (Yang et al., 2024, Patel et al., 2024).

6. Guidance, Constraints, and Sampling Enhancements

Classifier-free guidance (CFG), a mainstay in diffusion models, is not natively stable in rectified flow due to drift off the learned transport manifold. Recent advances propose geometry-aware, predictor-corrector schemes (Rectified-CFG++) that guarantee marginal proximity and bounded manifold deviation, enabling strong prompt adherence and artifact-free images across Flux, SD3/SD3.5, and Lumina (Saini et al., 9 Oct 2025). Boundary-enforced variants further impose analytic constraints on $\pi_1$ 6 at $\pi_1$ 7, eliminating endpoint bias and stabilizing ODE/SDE sampling (Hu et al., 18 Jun 2025). Steering via the vector field can be performed in a deterministic, gradient-skipping fashion, facilitating tasks like classifier guidance, inpainting, and inverse-problem solving in a unified, memory- and compute-efficient manner (Patel et al., 2024).

7. Open Challenges and Future Directions

Active research addresses several limitations and new directions:

Model collapse: Iterative reflow steps relying solely on synthetic pairs induce model collapse, analogous to DAE degeneration; interleaving real pairs or careful SDE reverse-simulation avoids rank-loss and sample diversity degradation (Zhu et al., 2024, Seong et al., 29 Oct 2025).
Data efficiency and real-sample anchoring: Balanced Conic Rectified Flow sharply reduces the need for millions of synthetic pairs and improves robustness along real-image reversals (Seong et al., 29 Oct 2025).
Scalability and compression: SlimFlow extends rectified flow to highly compressed, small-footprint models with one-step sampling, via annealing and flow-guided distillation (Zhu et al., 2024).
Hierarchical and multimodal flows: Hierarchical rectified flows further minimize the number of NFEs by modeling full velocity (and higher derivative) distributions, permitting path intersection (Zhang et al., 24 Feb 2025).
Connections to optimal transport: The c-rectified flow generalizes the methodology to directly solve Monge–Kantorovich OT for user-specified costs while preserving marginal constraints at every step (Liu, 2022).
Domain transfer, adaptation, and fine-grained editing: Applications in domain adaptation, fine-grained style transfer, and inversion-free semantic editing are ongoing (Dalva et al., 2024, Yang et al., 2024, Patel et al., 2024).

Future advances are likely to focus on improved solver-adaptive schemes, further analysis of straightness and marginality in high dimensions, enhanced semantic disentanglement, and expansion into multimodal, hierarchical, and plug-and-play generative tasks.

References: (Liu et al., 2022, Guan et al., 2023, Liu, 2022, Bansal et al., 2024, Zhu et al., 2024, Ma et al., 12 Mar 2025, Patel et al., 2024, Dalva et al., 2024, Braunstein et al., 2024, Armegioiu et al., 3 Jun 2025, Seong et al., 29 Oct 2025, Yoo et al., 21 Jul 2025, Zhang et al., 24 Feb 2025, Hu et al., 18 Jun 2025, Zhu et al., 2024, Li et al., 25 Nov 2025, Saini et al., 9 Oct 2025, Yang et al., 2024, Ma et al., 2024)