Improved Training Techniques for Rectified Flows: Enhancing Low NFE Performance
Rectified flows have emerged as a compelling alternative to diffusion models for image and video generation tasks, particularly when emphasis is placed on reducing the number of function evaluations (NFEs) required for generation. The paper under review introduces a suite of novel training techniques aimed at significantly enhancing the performance of rectified flows, enabling them to perform competitively with state-of-the-art distillation methods such as Consistency Models (CD) and Progressive Distillation (PD) even in low NFE regimes.
Key Findings and Innovations
One-Round Reflow Sufficiency
A central claim presented in this work is that a single iteration of the Reflow algorithm is sufficient to learn nearly straight trajectories. Previous methods employed multiple iterations of Reflow, which increased computational burden and often resulted in error accumulation. By improving the training process, this paper demonstrates that a single Reflow iteration can achieve similar or superior quality in terms of the generated samples.
Enhanced Training Techniques
The following improvements were proposed for training rectified flows:
- U-shaped Timestep Distribution: This distribution focuses more training effort on the challenging timesteps by adopting a non-uniform (specifically, U-shaped) sampling strategy. Empirical evaluation on datasets such as CIFAR-10 shows significant improvements, with a 28% reduction in FID compared to uniform timestep distribution.
- LPIPS-Huber Loss Function: Replacing the traditional squared ℓ2 distance, the LPIPS-Huber loss integrates perceptual similarity (LPIPS) with Huber robustness. This improves the perceptual quality of generated images and reduces FID by up to 50% on certain datasets.
Theoretical and Practical Implications
These improvements shift the paradigm of training rectified flows, allowing them to provide high-quality samples with fewer computational resources. The primary implication is a newfound ability to compete with distilled diffusion models and other state-of-the-art methods in both one-step and two-step settings. The enhancements also hold potential for applications beyond image generation, including image editing tasks and watermarking, where inversion capabilities of rectified flows are beneficial.
Empirical Evaluation
In rigorous experiments on CIFAR-10 and ImageNet 64×64 datasets, the enhanced rectified flows demonstrated:
- A reduction of up to 72% in FID.
- Comparable or superior performance to distillation-based methods, where 2-rectified flow achieved an FID of 3.38 in a single step, outperforming existing methods like consistency distillation (CD) and progressive distillation (PD).
In addition to quantitative improvements, qualitative advantages were observed in applications requiring few-step inversion and image-to-image translation tasks, showcasing the capability for realistic noise inversions with minimal NFE.
Future Directions in AI Research
This work opens avenues for further refinement of generative ODE-based models. The empirical results suggest that leveraging advanced solvers or integrating learning-based solvers could further optimize the quality-velocity trade-off in sampling. These developments provide a foundation for more computationally efficient and perceptually robust image generation techniques. Future research may also explore the integration of these improved techniques in broader contexts and other generative frameworks, paving the way for more practical and versatile AI models.
Conclusion
This paper represents a significant step towards making rectified flows a viable alternative to current distillation-based methods in the low NFE regime. By introducing improved training techniques, including specialized timestep distributions and enhanced objective functions, this research demonstrates the potential for rectified flows to achieve state-of-the-art performance in image generation tasks. These contributions enhance our understanding of how to efficiently train generative models and open new possibilities for their applications in AI.
While challenges remain in streamlining training processes and achieving parity with the best consistency models, the insights and innovations presented provide a strong foundation for future advancements. The ability to generate high-quality samples with fewer computational resources not only advances the technical landscape of AI but also holds promise for more accessible and efficient deployment of generative models across varied applications.