Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Flow Generator Matching (2410.19310v1)

Published 25 Oct 2024 in cs.CV, cs.AI, cs.LG, and cs.MM

Abstract: In the realm of Artificial Intelligence Generated Content (AIGC), flow-matching models have emerged as a powerhouse, achieving success due to their robust theoretical underpinnings and solid ability for large-scale generative modeling. These models have demonstrated state-of-the-art performance, but their brilliance comes at a cost. The process of sampling from these models is notoriously demanding on computational resources, as it necessitates the use of multi-step numerical ordinary differential equations (ODEs). Against this backdrop, this paper presents a novel solution with theoretical guarantees in the form of Flow Generator Matching (FGM), an innovative approach designed to accelerate the sampling of flow-matching models into a one-step generation, while maintaining the original performance. On the CIFAR10 unconditional generation benchmark, our one-step FGM model achieves a new record Fr\'echet Inception Distance (FID) score of 3.08 among few-step flow-matching-based models, outperforming original 50-step flow-matching models. Furthermore, we use the FGM to distill the Stable Diffusion 3, a leading text-to-image flow-matching model based on the MM-DiT architecture. The resulting MM-DiT-FGM one-step text-to-image model demonstrates outstanding industry-level performance. When evaluated on the GenEval benchmark, MM-DiT-FGM has delivered remarkable generating qualities, rivaling other multi-step models in light of the efficiency of a single generation step.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (90)
  1. Fast inference in denoising diffusion models via mmd finetuning. ArXiv, abs/2301.07969, 2023.
  2. Building normalizing flows with stochastic interpolants. arXiv preprint arXiv:2209.15571, 2022.
  3. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=B1xsqj09Fm.
  4. Video generation models as world simulators. 2024. URL https://openai.com/research/video-generation-models-as-world-simulators.
  5. Flash diffusion: Accelerating any conditional diffusion model for few steps image generation. arXiv preprint arXiv:2406.02347, 2024.
  6. Pixart-α𝛼\alphaitalic_α: Fast training of diffusion transformer for photorealistic text-to-image synthesis. arXiv preprint arXiv:2310.00426, 2023.
  7. Neural Ordinary Differential Equations. In Advances in neural information processing systems, pp. 6571–6583, 2018.
  8. Residual flows for invertible generative modeling. In Advances in Neural Information Processing Systems, pp. 9916–9926, 2019.
  9. Diffedit: Diffusion-based semantic image editing with mask guidance. ArXiv, abs/2210.11427, 2022.
  10. Variational schr\\\backslash\” odinger diffusion models. arXiv preprint arXiv:2405.04795, 2024.
  11. Scaling rectified flow transformers for high-resolution image synthesis. In Forty-first International Conference on Machine Learning, 2024.
  12. Stable audio open. arXiv preprint arXiv:2407.14358, 2024.
  13. Optimizing ddpm sampling with shortcut fine-tuning. ArXiv, abs/2301.13362, 2023.
  14. Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models. ArXiv, abs/2305.16381, 2023.
  15. A lipschitz bandits approach for continuous hyperparameter optimization. arXiv preprint arXiv:2302.01539, 2023.
  16. One-step diffusion distillation via deep equilibrium models. Advances in Neural Information Processing Systems, 36, 2024a.
  17. Consistency models made easy. arXiv preprint arXiv:2406.14548, 2024b.
  18. Geneval: An object-focused framework for evaluating text-to-image alignment. Advances in Neural Information Processing Systems, 36, 2024.
  19. Ffjord: Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations, 2018.
  20. Boot: Data-free distillation of denoising diffusion models with bootstrapping. ArXiv, abs/2306.05544, 2023a.
  21. Boot: Data-free distillation of denoising diffusion models with bootstrapping. In ICML 2023 Workshop on Structured Probabilistic Inference {{\{{\\\backslash\&}}\}} Generative Modeling, 2023b.
  22. Multistep consistency models. arXiv preprint arXiv:2403.06807, 2024.
  23. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6626–6637, 2017.
  24. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  25. Video diffusion models. arXiv preprint arXiv:2204.03458, 2022.
  26. Equivariant diffusion for molecule generation in 3d. In International Conference on Machine Learning, pp. 8867–8887. PMLR, 2022.
  27. Li Jing James Betker, Gabriel Goh et al. Improving image generation with better captions, 2023. URL https://cdn.openai.com/papers/dall-e-3.pdf. Available as PDF.
  28. Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems, 33, 2020a.
  29. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  8110–8119, 2020b.
  30. Elucidating the design space of diffusion-based generative models. In Proc. NeurIPS, 2022.
  31. A multi-class hinge loss for conditional gans. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp.  1290–1299, 2021.
  32. Consistency trajectory models: Learning probability flow ode trajectory of diffusion. arXiv preprint arXiv:2310.02279, 2023.
  33. Guided-tts: A diffusion model for text-to-speech via classifier guidance. In International Conference on Machine Learning, pp. 11119–11133. PMLR, 2022.
  34. Glow: Generative flow with invertible 1x1 convolutions. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.), Advances in Neural Information Processing Systems 31, pp.  10215–10224. 2018.
  35. The CIFAR-10 Dataset. online: http://www. cs. toronto. edu/kriz/cifar. html, 55, 2014.
  36. Flow matching for generative modeling. ArXiv, abs/2210.02747, 2022a.
  37. Flow matching for generative modeling. arXiv preprint arXiv:2210.02747, 2022b.
  38. Scott: Accelerating diffusion models with stochastic consistency distillation. arXiv preprint arXiv:2403.01505, 2024.
  39. Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003, 2022.
  40. Instaflow: One step is enough for high-quality diffusion-based text-to-image generation. In The Twelfth International Conference on Learning Representations, 2023.
  41. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
  42. Weijian Luo. A comprehensive survey on knowledge distillation of diffusion models. arXiv preprint arXiv:2304.04262, 2023.
  43. Weijian Luo. Diff-instruct++: Training one-step text-to-image generator model to align with human preferences. arXiv preprint arXiv:2410.18881, 2024.
  44. Data prediction denoising models: The pupil outdoes the master, 2024. URL https://openreview.net/forum?id=wYmcfur889.
  45. Diff-instruct: A universal approach for transferring knowledge from pre-trained diffusion models. ArXiv, abs/2305.18455, 2023a.
  46. Training energy-based models with diffusion contrastive divergences. arXiv preprint arXiv:2307.01668, 2023b.
  47. Diff-instruct: A universal approach for transferring knowledge from pre-trained diffusion models. Advances in Neural Information Processing Systems, 36, 2024a.
  48. One-step diffusion distillation through score implicit matching. arXiv preprint arXiv:2410.16794, 2024b.
  49. Entropy-based training methods for scalable neural implicit samplers. Advances in Neural Information Processing Systems, 36, 2024c.
  50. Sdedit: Image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
  51. On distillation of guided diffusion models. arXiv preprint arXiv:2210.03142, 2022.
  52. Action matching: Learning stochastic dynamics from samples. In International conference on machine learning, pp. 25858–25889. PMLR, 2023.
  53. Swiftbrush: One-step text-to-image diffusion model with variational score distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
  54. Improved denoising diffusion probabilistic models. arXiv preprint arXiv:2102.09672, 2021.
  55. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
  56. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
  57. Deep equilibrium approaches to diffusion models. Advances in Neural Information Processing Systems, 35:37975–37990, 2022.
  58. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  59. Zero-shot text-to-image generation. In International Conference on Machine Learning, pp. 8821–8831. PMLR, 2021.
  60. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  61. Hyper-sd: Trajectory segmented consistency model for efficient image synthesis. arXiv preprint arXiv:2404.13686, 2024.
  62. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10684–10695, 2022.
  63. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
  64. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=TIdIXIpzhoI.
  65. Multistep distillation of diffusion models via moment matching. arXiv preprint arXiv:2406.04103, 2024.
  66. Stylegan-xl: Scaling stylegan to large diverse datasets. ACM SIGGRAPH 2022 Conference Proceedings, 2022.
  67. Adversarial diffusion distillation. arXiv preprint arXiv:2311.17042, 2023.
  68. Fast high-resolution image synthesis with latent adversarial diffusion distillation. arXiv preprint arXiv:2403.12015, 2024.
  69. Improved techniques for training consistency models. arXiv preprint arXiv:2310.14189, 2023.
  70. Consistency models. arXiv preprint arXiv:2303.01469, 2023.
  71. San: Inducing metrizability of gan with discriminative normalized linear layer. arXiv preprint arXiv:2301.12811, 2023.
  72. Integrating amortized inference with diffusion models for learning clean distribution from corrupted images. arXiv preprint arXiv:2407.11162, 2024.
  73. Learning fast samplers for diffusion models by differentiating through sample quality. In International Conference on Learning Representations, 2022.
  74. Smart: Improving gans with score matching regularity. arXiv preprint arXiv:2311.18208, 2023.
  75. Tackling the generative learning trilemma with denoising diffusion gans. In International Conference on Learning Representations, 2021.
  76. Em distillation for one-step diffusion models, 2024. URL https://arxiv.org/abs/2405.16852.
  77. Ufogen: You forward once large scale text-to-image generation via diffusion gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  8196–8206, 2024.
  78. SA-solver: Stochastic adams solver for fast sampling of diffusion models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=f6a9XVFYIo.
  79. Consistency flow matching: Defining straight flows with velocity consistency. arXiv preprint arXiv:2407.02398, 2024.
  80. Improved distribution matching distillation for fast image synthesis. arXiv preprint arXiv:2405.14867, 2024a.
  81. One-step diffusion with distribution matching distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  6613–6623, 2024b.
  82. Xuanwu Yin Yuda Song, Zehao Sun. Sdxs: Real-time one-step latent diffusion models with image conditions. arxiv, 2024.
  83. Mini-dalle3: Interactive text to image by prompting large language models. arXiv preprint arXiv:2310.07653, 2023.
  84. Purify++: Improving diffusion-purification with advanced diffusion models and control of randomness. arXiv preprint arXiv:2310.18762, 2023.
  85. Enhancing Adversarial Robustness via Score-Based Optimization. Advances in Neural Information Processing Systems, 36, 2024.
  86. Feature quantization improves gan training. arXiv preprint arXiv:2004.02088, 2020.
  87. Diffusion models are innate one-step generators. arXiv preprint arXiv:2405.20750, 2024.
  88. Fast sampling of diffusion models via operator learning. arXiv preprint arXiv:2211.13449, 2022.
  89. Truncated diffusion probabilistic models and diffusion-based adversarial auto-encoders. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=HDxgaKk956l.
  90. Score identity distillation: Exponentially fast distillation of pretrained diffusion models for one-step generation. In International Conference on Machine Learning, 2024.
Citations (2)

Summary

We haven't generated a summary for this paper yet.