Papers
Topics
Authors
Recent
Search
2000 character limit reached

Low-Dimensional Adaptation of Rectified Flow: A New Perspective through the Lens of Diffusion and Stochastic Localization

Published 21 Jan 2026 in stat.ML, cs.AI, cs.LG, and math.ST | (2601.15500v1)

Abstract: In recent years, Rectified flow (RF) has gained considerable popularity largely due to its generation efficiency and state-of-the-art performance. In this paper, we investigate the degree to which RF automatically adapts to the intrinsic low dimensionality of the support of the target distribution to accelerate sampling. We show that, using a carefully designed choice of the time-discretization scheme and with sufficiently accurate drift estimates, the RF sampler enjoys an iteration complexity of order $O(k/\varepsilon)$ (up to log factors), where $\varepsilon$ is the precision in total variation distance and $k$ is the intrinsic dimension of the target distribution. In addition, we show that the denoising diffusion probabilistic model (DDPM) procedure is equivalent to a stochastic version of RF by establishing a novel connection between these processes and stochastic localization. Building on this connection, we further design a stochastic RF sampler that also adapts to the low-dimensionality of the target distribution under milder requirements on the accuracy of the drift estimates, and also with a specific time schedule. We illustrate with simulations on the synthetic data and text-to-image data experiments the improved performance of the proposed samplers implementing the newly designed time-discretization schedules.

Summary

  • The paper presents a novel framework demonstrating that rectified flow samplers can exploit intrinsic low-dimensional structure to achieve accelerated sampling.
  • It introduces a U-shaped non-uniform time discretization that enhances drift estimation and outperforms uniform grids across high ambient dimensions.
  • The work unifies rectified flow, DDPM, and stochastic localization into a robust stochastic variant (Stoc-RF) validated with practical experiments.

Low-Dimensional Adaptation of Rectified Flow: Theory, Methods, and Empirical Validation

Introduction

Rectified Flow (RF), and more broadly flow-matching generative modeling, has become prominent due to its deterministic ODE-based sampling, ease of simulation, and empirical efficacy for data across image, video, and audio modalities. This work introduces a rigorous theoretical and algorithmic framework, establishing for the first time that RF samplers can automatically exploit the intrinsic low-dimensional structure of the target distribution to achieve accelerated sampling, both in theory and practice. The analysis extends to connect RF and Denoising Diffusion Probabilistic Models (DDPM) via stochastic localization, offering a unified perspective and enabling the construction of a stochastic version of RF (Stoc-RF) with similarly favorable low-dimensional adaptation but under weaker requirements.

Theoretical Foundations

Rectified Flow and Its Deterministic Sampling

RF seeks to transform a simple base distribution (typically standard Gaussian) to a complex data distribution, implementing the transition through an ODE parameterized by the velocity function vt(x)v_t(x), computed as the conditional expectation E[X1X0Xt=x]E[X_1-X_0 \mid X_t=x]. The process is implemented via discretized Euler steps, where the quality of sampling is governed by the accuracy of the learned drift and the time-discretization scheme, both critical for preserving generative fidelity.

Low-Dimensional Adaptation: Intrinsic Dimension in Sampler Complexity

A significant advance of this work is the theoretical proof that, with carefully designed U-shaped non-uniform time discretization, the RF sampler achieves iteration complexity of order O(k/ε)O(k/\varepsilon) up to logarithmic factors, where kk is the intrinsic dimension of the support of the target distribution (as defined by its metric entropy), and ε\varepsilon is the total variation error of the generated samples. This reveals that for data concentrated near low-dimensional manifolds (kdk \ll d), RF can converge much faster, independent of the ambient dimension dd.

Time-Discretization and Its Role

The paper proposes and mathematically grounds a novel U-shaped time discretization schedule that concentrates discretization points near t=0t=0 and t=1t=1, improving the estimation and application of the learned drift where the dynamics are most demanding. This design both theoretically and empirically outperforms uniform time grids, especially as either the ambient dimension or iteration count increases. Figure 1

Figure 1: U-shaped time discretization histogram, illustrating concentration of steps near endpoints where fast adaptation is needed.

Stochastic Localization and the Diffusion-Flow Connection

Stochastic Localization as a Unifying Formalism

Stochastic localization (SL) is leveraged as a framework to analyze high-dimensional probability paths. The paper constructs a precise equivalence between the paths traversed by RF, DDPM, and the SL process, via nontrivial time changes. This machinery allows tight transfers of convergence results amongst these samplers.

DDPM as Stochastic Rectified Flow

A central technical contribution is showing DDPM is equivalent in law to a stochastic variant of RF (Stoc-RF), wherein a Langevin “correction” is introduced at each step to mitigate error accumulation in the trajectory from imperfect velocity estimation. This stochasticity ensures robustness and removes the need for higher-order regularity assumptions on the learned model. Figure 2

Figure 2

Figure 2: Trajectories of RF (deterministic ODE) and Stoc-RF (stochastic SDE) samplers for a 2-Gaussian target. The stochastic version maintains better fidelity, reducing spurious outlier generations.

Convergence Analysis

RF: Deterministic Sampler Convergence

Under the proposed U-shaped discretization and with stringent control on the approximation error of the velocity (and its higher-order derivatives), the paper shows RF achieves TV convergence with no explicit dependence on dd, instead scaling linearly with kk. The rate is:

(pXN1,pYN1)O(klog3(1/δ)N)+error terms(p_{X_{N-1}}, p_{Y_{N-1}}) \leq O\left(\frac{k\log^3(1/\delta)}{N}\right) + \text{error terms}

with choices of δ\delta and NN balancing statistical and computational tradeoffs. Critically, deterministic RF samplers require higher regularity conditions on the estimated drift due to lack of trajectory correction.

Stoc-RF: Stochastic Sampler Convergence

For Stoc-RF, the theoretical rate closely matches the deterministic case, but with no reliance on higher-order derivatives of the drift, an essential improvement for practical scenarios where neural estimators are used.

(pYtN1,pXtN1)O(klog3NN)+O(logN)ε(p_{Y_{t_{N-1}}}, p_{X_{t_{N-1}}}) \leq O\left(\frac{k\log^3 N}{N}\right) + O(\sqrt{\log N})\varepsilon

Thus, Stoc-RF provides favorable guarantees under weaker model requirements.

Practical Consequences and Limitations

The main caveat, emphasized both theoretically and empirically, is that practical adaptation to low-dimensionality critically depends on the accuracy with which the drift (or score) is estimated, particularly in high dimensions or in the presence of support singularities. Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3: The generation quality of RF and Stoc-RF deteriorates with increasing dimensionality, especially along the low-dimensional manifold; stochastic correction mitigates but does not eliminate this effect unless estimation is sufficiently accurate.

Experimental Support

Synthetic Low-Dimensional Gaussian

Systematic experiments with target Gaussian distributions of low intrinsic but high ambient dimension demonstrate that non-uniform (U-shaped) time discretization substantially improves adaptation, maintaining low total variation error even as dd increases and kk remains fixed. Figure 2

Figure 2

Figure 2: RF and Stoc-RF trajectory visualization for a 2-Gaussian mixture, illustrating improved mode coverage and distribution matching for the stochastic sampler.

Text-to-Image Generation

Employing the Flux model, pretrained with RF, the authors show prompt-based text-to-image generation on challenging queries demonstrates a distinct qualitative and quantitative improvement using their time discretization method. Hallucination rates, text coherence, and semantic completeness are all enhanced. Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4: Text-to-image generation example, “A photo of a cat holding a sign {Rectified flow}”, as produced by the improved Flux/Rectified Flow pipeline.

Implications and Future Directions

Theoretical and Practical Impact

This work brings RF to parity with the theoretically well-understood convergence properties of diffusion models, opening new algorithmic possibilities for accelerated sampling in high-dimensional generative modeling. The unification of RF, DDPM, and SL enables cross-method improvements and import of techniques, e.g., accelerated samplers, from the diffusion literature to the flow paradigm. Furthermore, the simplicity of the U-shaped discretization makes it immediately applicable to pretrained RF models.

Open Challenges

Key directions remain, such as developing improved neural architectures for better velocity estimation along low-dimensional, possibly nonlinear, manifolds, especially for challenging endpoint behaviors. Additionally, transferring these insights to other variants of stochastic interpolants and their implementation at scale is poised for further exploration.

Conclusion

The paper establishes that rectified flow samplers can provably and empirically exploit intrinsic low-dimensional structure for faster, higher-quality generative sampling, provided a theoretically grounded non-uniform time-discretization is employed. By relating RF and stochastic RF to DDPM through the lens of stochastic localization, the work not only achieves new theoretical guarantees but also introduces practical algorithms that outperform existing methods, especially as the complexity of the data manifold grows. The equivalence results and corresponding adaptation theory promise cross-fertilization between flow-matching and diffusion-based generative approaches. Figure 5

Figure 5

Figure 5

Figure 5

Figure 5: Text-to-image generation (Flux) on the prompt “A photo of a cat holding a sign {Rectified flow}”, visually supporting empirical claims of better text rendering and prompt adherence with the proposed methods.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 12 likes about this paper.