Rectified Flow: Deterministic ODE Modeling

Updated 23 January 2026

Rectified Flow is a deterministic ODE-based generative modeling framework that enforces straight-line trajectories from simple to complex distributions.
It minimizes the squared residual between the straight-line path and the learned velocity field, reducing transport error and computational cost.
RF supports plug-and-play priors and is applied across domains such as image synthesis, protein design, and 3D modeling for state-of-the-art performance.

Rectified Flow (RF) is a deterministic, ODE-based generative modeling framework that enforces straight-line transport between a simple source distribution (e.g., Gaussian noise) and a complex target distribution (e.g., natural images, proteins, audio, 3D data). By directly minimizing the discrepancy from the straight-line path, RF achieves fast, high-fidelity generation and serves as a unified approach for generative modeling, plug-and-play priors, domain transfer, and optimal transport. Its theoretical underpinnings and practical implementations span image, audio, protein, and 3D Gaussian Splatting domains, yielding state-of-the-art sample quality with substantially reduced computational cost.

1. Fundamental Principles and Mathematical Formulation

Rectified Flow posits a deterministic ODE,

$\frac{\mathrm{d}Z_t}{\mathrm{d}t} = v_\phi(Z_t,\,t),\qquad t\in[0,1],\quad Z_0\sim\pi_0,$

where $\pi_0$ (typically Gaussian noise) is mapped via the learned velocity field $v_\phi$ into $\pi_1$ (target data) at $t=1$ (Yang et al., 2024, Liu et al., 2022, Bansal et al., 2024).

Training enforces that all trajectories match the straight-line (linear) interpolation: $X_t = (1-t)\,X_0 + t\,X_1,\qquad X_0\sim\pi_0,\;\; X_1\sim\pi_1,$ with the vector field learned via the squared residual: $\min_\phi\;\int_0^1 \mathbb{E}_{X_0,X_1} \big\| (X_1-X_0) - v_\phi(X_t,t) \big\|^2\,\mathrm{d}t.$ This contrasts with diffusion models, which train score networks using denoising score matching under an SDE.

The velocity field admits an alternate conditional-expectation characterization: $v^*(x,t) = \mathbb{E}[ X_1 - X_0 \mid X_t = x ],$ and, under mild assumptions, the solution $Z_t$ has marginal $X_t$ , yielding exact marginal preservation (Mena et al., 5 Nov 2025, Liu, 2022).

2. Trajectory Structure, Straightness, and Theoretical Guarantees

The distinguishing feature of RF is that it drives the joint coupling toward "straight" paths:

A 1-rectified flow reduces convex transport costs and produces approximately straight flows [(Bansal et al., 2024), Thm. 2].
After two rectifications (2-RF), the coupling is straight in the sense that

$\mathbb{E}[Z_1 - Z_0 | t Z_1 + (1-t) Z_0 ] = Z_1 - Z_0$

almost surely for all $t$ (Bansal et al., 2024). In one dimension, 1-RF already recovers the Monge map for quadratic cost.

Successive rectification provably decreases the average path curvature at $O(1/K)$ rate after $K$ steps, with marginals preserved at all iterations (Liu et al., 2022).
Transport error in Wasserstein distance is controlled by the discretization step size and straightness defect, achieving $O(N^{-1})$ convergence in number of steps (Bansal et al., 2024).

3. Practical Algorithms, Sampling, and Efficiency

Sampling

Generation reduces to solving the neural ODE from $Z_0\sim\pi_0$ :

$Z_1 = Z_0 + \int_{0}^{1} v_\phi(Z_t,t)\,\mathrm{d}t.$

For nearly straight trajectories, even a single Euler or RK2 step suffices for high fidelity; 3–5 steps are often sufficient for images, audio, and radar (Yang et al., 2024, Liu et al., 2024, Luo et al., 7 Jan 2026).

Plug-and-Play and Inversion

Pretrained RF models can be used as priors for loss-based optimization by evaluating the RF residual on custom generator outputs (e.g., NeRF, image renderers), enabling text-to-3D and image inversion/editing at greatly reduced cost relative to diffusion-based Score Distillation Sampling (SDS) (Yang et al., 2024).
Time symmetry of trajectories allows exact inversion (image→noise) by running the RF ODE backward; gradient-based refinement of noise, as in iRFDS or DNAEdit, further improves inversion fidelity, enabling precise, drift-free image editing (Xie et al., 2 Jun 2025).

Domain-Specific Adaptations

For protein backbone design, RF generalizes via geodesic interpolation on SE(3)^N, with manifold-aware losses and noise-focused discretizations yielding 5–10x reductions in function evaluations for fixed design quality (Chen et al., 13 Oct 2025).
For audio, multi-band Rectified Flow processes STFT frames and subbands with substantial parallelism, producing competitive reconstructions in 10 steps and achieving near-real-time synthesis (Liu et al., 2024).
For radar nowcasting, RF training with near-linear objectives and guided feature fusion yields sharp, high-fidelity forecasts in 5 ODE steps (Luo et al., 7 Jan 2026).
For 3D Gaussian Splatting, multi-view RF generates image/depth/pose latents jointly, decoded by a learned GSDecoder, supporting direct 3D scene generation and editing (Go et al., 2024).

4. Advanced Extensions and Methodological Innovations

Trajectory Diversity and Momentum Extensions

Standard RF trajectories are deterministic and may sacrifice diversity; Discretized-RF injects stochastic "momentum" on sub-path velocities, inducing multi-modal flows and improved multi-scale noise modeling while retaining ODE efficiency (Ma et al., 10 Jun 2025).
Momentum-flow matching, with partitioned velocity sub-paths and random field sampling (β, γ parameters), yields improved FID and recall, especially in high-variation domains.

Rectification, Reflow, and Data Efficiency

Iterated "reflow" using real/generated pairs drives the flows closer to straight; Balanced Conic Rectified Flow (BCRF) anchors reflow with real data using "conic" Slerp neighborhoods, reducing generative-pair requirements by >90% and sharply reducing distributional drift (Seong et al., 29 Oct 2025).
On CIFAR-10, BCRF achieves FID ≈5.5 in one-step Euler generation with only 350k generated + 50k real pairs.

Vanilla RF models may violate boundary conditions $v_\theta(x,1)\neq x$ , destabilizing stochastic sampling. Boundary RF models (mask-based or subtraction-based) enforce these constraints by design, improving FID by 8–9% on ImageNet and preventing singularities in score functions (Hu et al., 18 Jun 2025).

5. RF in Relation to Diffusion and Optimal Transport

Comparison to Diffusion

Both RF and diffusion map noise to data, but RF uses a deterministic ODE matched to straight-line paths, while diffusion uses SDEs with score estimation (Yang et al., 2024, Esser et al., 2024).
Key connections:
- Deterministic RF is a "probability-flow" ODE, while DDPM is a stochastic SDE.
- DDPM and RF are linked through stochastic localization under suitable time change and drift/randomness parametrizations (Roy et al., 21 Jan 2026).
- RF complexity can adapt to intrinsic data dimension $k$ , allowing O(k/ε) steps for total variation accuracy ε.

Optimal Transport

RF can be interpreted as a marginal-preserving, interior-point method for convex OT problems, operating entirely within the set of couplings $\Pi(\pi_0, \pi_1)$ (Liu, 2022).
Under quadratic cost, 1-RF in 1D recovers the exact Monge map; successive rectifications approach the optimal OT map as per the Benamou–Brenier dynamic formulation (Bansal et al., 2024).

6. Statistical Properties, Estimation, and Error Rates

Analysis of the statistical properties of RF reveals:

Existence, uniqueness, and regularity of RF maps under both unbounded (log-concave) and bounded (compact) support assumptions (Mena et al., 5 Nov 2025).
Rates of convergence for kernel or regression-based estimators: in unbounded settings, the empirical RF map converges at $O(h^\beta + \sqrt{(\log n)/(nh^d)})$ in bias-variance; in bounded domains, rates follow deconvolution exponents.
Asymptotic normality (CLTs) for kernel-based estimators.
Practically, regression or density estimation tools suffice for empirical RF implementations with finite-sample guarantees.

7. Applications, Performance, and Open Directions

Empirical Highlights

Image synthesis: SOTA FID/IS on CIFAR-10 with 2–3 rectifications and one-step or few-step sampling (Liu et al., 2022, Esser et al., 2024).
Text-to-3D and 2D editing: RF-based priors outperform SDS/VSD in quality and speed, reduce optimization burden for NeRFs, and improve image inversion and editing fidelity (Yang et al., 2024, Xie et al., 2 Jun 2025, Chen et al., 16 Sep 2025).
3D Gaussian Splatting: SplatFlow matches or improves upon specialized pipelines for direct 3DGS synthesis and editing (Go et al., 2024).
Protein design: ReFlow achieves 5–10× speedups over conventional flow matching with careful coupling and discretization methodology (Chen et al., 13 Oct 2025).

Limitations and Research Opportunities

Regularization: Boundary enforcement and curvature reduction remain important for robustness and quality across domains (Hu et al., 18 Jun 2025, Seong et al., 29 Oct 2025).
Domain transfer: Heuristics effective in vision may degrade protein/geometric modeling performance, necessitating problem-specific adaptation (Chen et al., 13 Oct 2025).
Statistical rates: Variance amplification at endpoints, discretization error, and inversion error in high dimensions invite further theoretical analysis (Mena et al., 5 Nov 2025, Bansal et al., 2024).
Extending self-supervised fine-tuning methods (e.g., RFMI) for better prompt conditioning and alignment stands as a promising direction, as does leveraging low-dimensional structure via adaptive time discretization (Roy et al., 21 Jan 2026, Wang et al., 18 Mar 2025).