FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner (2409.18128v1)

Published 26 Sep 2024 in cs.CV

Abstract: Building on the success of diffusion models in visual generation, flow-based models reemerge as another prominent family of generative models that have achieved competitive or better performance in terms of both visual quality and inference speed. By learning the velocity field through flow-matching, flow-based models tend to produce a straighter sampling trajectory, which is advantageous during the sampling process. However, unlike diffusion models for which fast samplers are well-developed, efficient sampling of flow-based generative models has been rarely explored. In this paper, we propose a framework called FlowTurbo to accelerate the sampling of flow-based models while still enhancing the sampling quality. Our primary observation is that the velocity predictor's outputs in the flow-based models will become stable during the sampling, enabling the estimation of velocity via a lightweight velocity refiner. Additionally, we introduce several techniques including a pseudo corrector and sample-aware compilation to further reduce inference time. Since FlowTurbo does not change the multi-step sampling paradigm, it can be effectively applied for various tasks such as image editing, inpainting, etc. By integrating FlowTurbo into different flow-based models, we obtain an acceleration ratio of 53.1%$\sim$58.3% on class-conditional generation and 29.8%$\sim$38.5% on text-to-image generation. Notably, FlowTurbo reaches an FID of 2.12 on ImageNet with 100 (ms / img) and FID of 3.93 with 38 (ms / img), achieving the real-time image generation and establishing the new state-of-the-art. Code is available at https://github.com/shiml20/FlowTurbo.

Summary

The paper introduces a lightweight velocity refiner that leverages stabilized velocity predictions to reduce computational cost and accelerate sampling.
It employs a pseudo corrector, modifying Heun’s method to reuse previous predictions and cut down on model evaluations while retaining convergence order.
FlowTurbo achieves significant speed and quality improvements, setting a new state-of-the-art in real-time image generation across class-conditional and text-to-image tasks.

FlowTurbo: Accelerating Flow-Based Generative Models

The paper "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner" explores the field of flow-based generative models and proposes a novel understanding and implementation to enhance their efficiency. Over recent years, diffusion models have largely dominated the field of visual generation due to their robust denoising capabilities and flexible conditional injection. However, the sampling process, which demands multiple evaluations of the denoising network, significantly increases computational costs. Flow-based models, with their innovative probability paths, offer a promising alternative, but their efficient sampling remains underexplored.

Key Contributions

1. Velocity Refiner:

The crux of FlowTurbo lies in the introduction of a lightweight velocity refiner for flow-based models. By observing that the velocity predictors in flow-based models stabilize during sampling, it proposes an efficient means of estimating velocity through a refined, less computationally intensive model. This approach leverages the stability of the velocity field predictions, differentiating it from the more variable predictions found in diffusion models.

2. Pseudo Corrector:

The paper further enhances sampling speed through a pseudo corrector, which modifies the update rule found in Heun's method. By reutilizing previous velocity predictions, it effectively reduces the model evaluations required per sampling step, maintaining the original convergence order while significantly cutting down on computational overhead.

3. Sample-Aware Compilation:

FlowTurbo introduces sample-aware compilation, integrating model evaluations, sampling steps, and classifier-free guidance into a static graph for extra speedup. This novel approach outperforms existing model-level compilation strategies by optimizing at a more granular level.

Implementation and Numerical Results

FlowTurbo is empirically validated on both class-conditional image generation and text-to-image generation tasks. The framework has been integrated with diverse flow-based models, including SiT and InstaFlow, demonstrating significant improvements in both speed and quality. For class-conditional generation, FlowTurbo achieved acceleration ratios between 53.1% and 58.3%, and for text-to-image generation, it yielded ratios from 29.8% to 38.5%.

Detailed evaluations on benchmarks demonstrated that FlowTurbo could reach an FID of 2.12 on ImageNet with 100 ms/img, affirming its capability for real-time image generation and establishing a new state-of-the-art. The comparative analysis highlights that FlowTurbo not only enhances speed but also consistently maintains high visual quality across various tasks.

Implications and Future Directions

The proposed enhancements carry substantial theoretical and practical significance. From a theoretical standpoint, FlowTurbo exemplifies how velocity predictions' stability in flow-based models can be harnessed to optimize sampling processes. Practically, the framework facilitates the generation of high-quality images in real-time, with potential applications spanning image editing, inpainting, and beyond.

Future research directions could explore extending FlowTurbo to other generative frameworks. Given its reliance on the observed stability in velocity predictions, examining its applicability to diffusion-based models, where similar stable values might yet be discovered, could open new avenues. Additionally, further refining the pseudo corrector and sample-aware compilation techniques might yield even higher efficiencies, potentially transforming the landscape of generative modeling.

Conclusion

FlowTurbo presents a methodical and innovative approach to accelerating flow-based generative models, leveraging velocity stabilization for efficient sampling. The work establishes a new benchmark in the field, propelling flow-based models towards broader and faster applications, including real-time image generation. This paper provides a robust foundation for further optimization within flow-based generative model frameworks, signaling a significant step forward in practical and theoretical aspects of AI and generative modeling.

PDF Markdown

Related Papers