Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Consistency Flow Matching: Defining Straight Flows with Velocity Consistency (2407.02398v1)

Published 2 Jul 2024 in cs.CV

Abstract: Flow matching (FM) is a general framework for defining probability paths via Ordinary Differential Equations (ODEs) to transform between noise and data samples. Recent approaches attempt to straighten these flow trajectories to generate high-quality samples with fewer function evaluations, typically through iterative rectification methods or optimal transport solutions. In this paper, we introduce Consistency Flow Matching (Consistency-FM), a novel FM method that explicitly enforces self-consistency in the velocity field. Consistency-FM directly defines straight flows starting from different times to the same endpoint, imposing constraints on their velocity values. Additionally, we propose a multi-segment training approach for Consistency-FM to enhance expressiveness, achieving a better trade-off between sampling quality and speed. Preliminary experiments demonstrate that our Consistency-FM significantly improves training efficiency by converging 4.4x faster than consistency models and 1.7x faster than rectified flow models while achieving better generation quality. Our code is available at: https://github.com/YangLing0818/consistency_flow_matching

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in NeurIPS, vol. 33, pp. 6840–6851, 2020.
  2. L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, W. Zhang, B. Cui, and M.-H. Yang, “Diffusion models: A comprehensive survey of methods and applications,” ACM Computing Surveys, vol. 56, no. 4, pp. 1–39, 2023.
  3. L. Yang, Z. Yu, C. Meng, M. Xu, S. Ermon, and B. Cui, “Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal llms,” in International Conference on Machine Learning, 2024.
  4. L. Yang, Z. Zhang, Z. Yu, J. Liu, M. Xu, S. Ermon, and B. CUI, “Cross-modal contextualized diffusion models for text-guided visual generation and editing,” in International Conference on Learning Representations, 2024.
  5. R. T. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differential equations,” Advances in neural information processing systems, vol. 31, 2018.
  6. Y. Song, C. Durkan, I. Murray, and S. Ermon, “Maximum likelihood training of score-based diffusion models,” Advances in neural information processing systems, vol. 34, pp. 1415–1428, 2021.
  7. Y. Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow matching for generative modeling,” in The Eleventh International Conference on Learning Representations, 2022.
  8. M. S. Albergo and E. Vanden-Eijnden, “Building normalizing flows with stochastic interpolants,” in The Eleventh International Conference on Learning Representations, 2022.
  9. X. Liu, C. Gong, et al., “Flow straight and fast: Learning to generate and transfer data with rectified flow,” in The Eleventh International Conference on Learning Representations, 2022.
  10. Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Advances in neural information processing systems, vol. 32, 2019.
  11. X. Liu, X. Zhang, J. Ma, J. Peng, and Q. Liu, “Instaflow: One step is enough for high-quality diffusion-based text-to-image generation,” arXiv preprint arXiv:2309.06380, 2023.
  12. N. Kornilov, A. Gasnikov, and A. Korotin, “Optimal flow matching: Learning straight trajectories in just one step,” arXiv preprint arXiv:2403.13117, 2024.
  13. A. Tong, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks, K. FATRAS, G. Wolf, and Y. Bengio, “Improving and generalizing flow-based generative models with minibatch optimal transport,” in ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023.
  14. A.-A. Pooladian, H. Ben-Hamu, C. Domingo-Enrich, B. Amos, Y. Lipman, and R. T. Chen, “Multisample flow matching: Straightening flows with minibatch couplings,” 2023.
  15. Y. Song, P. Dhariwal, M. Chen, and I. Sutskever, “Consistency models,” in International Conference on Machine Learning, pp. 32211–32252, PMLR, 2023.
  16. B. Nguyen, B. Nguyen, and V. A. Nguyen, “Bellman optimal stepsize straightening of flow-matching models,” in The Twelfth International Conference on Learning Representations, 2024.
  17. D. Kim, C.-H. Lai, W.-H. Liao, N. Murata, Y. Takida, T. Uesaka, Y. He, Y. Mitsufuji, and S. Ermon, “Consistency trajectory models: Learning probability flow ode trajectory of diffusion,” in The Twelfth International Conference on Learning Representations, 2023.
  18. L. Klein, A. Krämer, and F. Noe, “Equivariant flow matching,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  19. H. Stark, B. Jing, C. Wang, G. Corso, B. Berger, R. Barzilay, and T. Jaakkola, “Dirichlet flow matching with applications to dna sequence design,” arXiv preprint arXiv:2402.05841, 2024.
  20. A. Campbell, J. Yim, R. Barzilay, T. Rainforth, and T. Jaakkola, “Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design,” arXiv preprint arXiv:2402.04997, 2024.
  21. A. Makkuva, A. Taghvaei, S. Oh, and J. Lee, “Optimal transport mapping via input convex neural networks,” in International Conference on Machine Learning, pp. 6672–6681, PMLR, 2020.
  22. M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in International conference on machine learning, pp. 214–223, PMLR, 2017.
  23. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
  24. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
  25. Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in International Conference on Learning Representations, 2020.
  26. J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” in International Conference on Learning Representations, 2020.
  27. D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in International conference on machine learning, pp. 1530–1538, PMLR, 2015.
  28. L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using real nvp,” in International Conference on Learning Representations, 2016.
  29. F. Bao, C. Li, J. Zhu, and B. Zhang, “Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models,” in International Conference on Learning Representations, 2021.
  30. T. Dockhorn, A. Vahdat, and K. Kreis, “Score-based generative modeling with critically-damped langevin diffusion,” in International Conference on Learning Representations, 2021.
  31. Z. Xiao, K. Kreis, and A. Vahdat, “Tackling the generative learning trilemma with denoising diffusion gans,” in International Conference on Learning Representations, 2021.
  32. C. Lu, Y. Zhou, F. Bao, J. Chen, C. Li, and J. Zhu, “Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps,” in Advances in Neural Information Processing Systems.
  33. T. Dockhorn, A. Vahdat, and K. Kreis, “GENIE: Higher-Order Denoising Diffusion Solvers,” Advances in Neural Information Processing Systems, 2022.
  34. H. Zheng, W. Nie, A. Vahdat, K. Azizzadenesheli, and A. Anandkumar, “Fast sampling of diffusion models via operator learning,” in International Conference on Machine Learning, pp. 42390–42402, PMLR, 2023.
  35. T. Salimans and J. Ho, “Progressive distillation for fast sampling of diffusion models,” in International Conference on Learning Representations, 2022.
  36. W. Luo, T. Hu, S. Zhang, J. Sun, Z. Li, and Z. Zhang, “Diff-instruct: A universal approach for transferring knowledge from pre-trained diffusion models,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  37. W. Luo, “A comprehensive survey on knowledge distillation of diffusion models,” arXiv preprint arXiv:2304.04262, 2023.
  38. Y. Song and P. Dhariwal, “Improved techniques for training consistency models,” arXiv preprint arXiv:2310.14189, 2023.
  39. Springer, 2009.
  40. S. Lee, B. Kim, and J. C. Ye, “Minimizing trajectory curvature of ode-based generative models,” in International Conference on Machine Learning, pp. 18957–18973, PMLR, 2023.
  41. P. Esser, S. Kulal, A. Blattmann, R. Entezari, J. Müller, H. Saini, Y. Levi, D. Lorenz, A. Sauer, F. Boesel, et al., “Scaling rectified flow transformers for high-resolution image synthesis,” arXiv preprint arXiv:2403.03206, 2024.
  42. J. C. Butcher, Numerical methods for ordinary differential equations. John Wiley & Sons, 2016.
  43. K. Alex, “Learning multiple layers of features from tiny images,” https://www. cs. toronto. edu/kriz/learning-features-2009-TR. pdf, 2009.
  44. T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.
  45. Y. Choi, Y. Uh, J. Yoo, and J.-W. Ha, “Stargan v2: Diverse image synthesis for multiple domains,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8188–8197, 2020.
  46. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
  47. A. Vahdat, K. Kreis, and J. Kautz, “Score-based generative modeling in latent space,” Advances in neural information processing systems, vol. 34, pp. 11287–11302, 2021.
  48. Y. Xu, Z. Liu, M. Tegmark, and T. Jaakkola, “Poisson flow generative models,” Advances in Neural Information Processing Systems, vol. 35, pp. 16782–16795, 2022.
  49. T. Karras, M. Aittala, T. Aila, and S. Laine, “Elucidating the design space of diffusion-based generative models,” Advances in Neural Information Processing Systems, vol. 35, pp. 26565–26577, 2022.
  50. D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1x1 convolutions,” Advances in neural information processing systems, vol. 31, 2018.
  51. R. T. Chen, J. Behrmann, D. K. Duvenaud, and J.-H. Jacobsen, “Residual flows for invertible generative modeling,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  52. Z. Xiao, Q. Yan, and Y. Amit, “Generative latent flow,” arXiv preprint arXiv:1905.10485, 2019.
  53. M. Grcić, I. Grubišić, and S. Šegvić, “Densely connected normalizing flows,” Advances in Neural Information Processing Systems, vol. 34, pp. 23968–23982, 2021.
  54. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in CVPR, pp. 10684–10695, 2022.
  55. D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. Müller, J. Penna, and R. Rombach, “Sdxl: Improving latent diffusion models for high-resolution image synthesis,” arXiv preprint arXiv:2307.01952, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Ling Yang (88 papers)
  2. Zixiang Zhang (3 papers)
  3. Zhilong Zhang (20 papers)
  4. Xingchao Liu (28 papers)
  5. Minkai Xu (40 papers)
  6. Wentao Zhang (261 papers)
  7. Chenlin Meng (39 papers)
  8. Stefano Ermon (279 papers)
  9. Bin Cui (165 papers)
Citations (5)

Summary

Consistency Flow Matching: A Novel Approach to Enhance Generative Model Efficiency

The paper introduces Consistency Flow Matching (Consistency-FM), an innovative method to improve the performance of generative models that utilize flow-based techniques. The primary goal of Consistency-FM is to enhance the efficiency of generating high-quality samples by addressing inherent computational challenges in existing methods. This paper accomplishes this by enforcing a self-consistency property within the velocity fields of Ordinary Differential Equations (ODEs) used to transport noise samples to data samples.

Core Concepts and Methodology

The concept of flow matching (FM) is central to this paper. FM entails learning a vector field that defines the trajectory of an ODE, allowing it to transform noise samples to the desired data distribution. Prior approaches struggle with maintaining a balance between computational cost and sampling quality. Existing models, such as Consistency Models (CMs) and Rectified Flow, either involve computationally expensive optimal transport plans or suffer from error accumulation due to iterative processes.

Consistency-FM innovatively addresses these issues by defining straight flows with consistent velocities. The proposed method extends past works by incorporating the following:

  • Self-Consistency in Velocity Fields: Consistency-FM directly enables straight trajectory flows by maintaining constant velocity values across different time segments, avoiding the need for full trajectory reconstructions or optimal transport estimates.
  • Multi-Segment Optimization: This technique divides the time intervals into multiple segments, training each to maintain consistency. This approach is particularly useful for modeling complex data distributions, allowing for flexible and piecewise linear paths.

These concepts are theoretically grounded by leveraging the consistency constraints side by side with approximation errors, providing a rigorous framework for training these models efficiently.

Experimental Insights

The empirical validation of Consistency-FM demonstrates significant advances. On classical image generation datasets such as CIFAR-10, CelebA-HQ, and AFHQ-Cat, the method shows it converges 4.4 times faster than previous consistency models and 1.7 times faster than rectified flow models, while also achieving superior image generation quality. For instance, Consistency-FM achieves a Frechet Inception Distance (FID) of 5.34 on the CIFAR-10 dataset, surpassing both Consistency Models and Rectified Flow models that exhibit higher FID values.

Practical and Theoretical Implications

The presented work carries both immediate and far-reaching implications. Practically, Consistency-FM provides benchmark improvements by delivering high-quality samples with significantly reduced computational requirements. This efficiency is crucial for scaling generative models to higher-resolution tasks and broader application areas such as text-to-image generation.

Theoretically, Consistency-FM redefines the implementation of flow models by embedding consistency directly into velocity fields rather than trajectory pathways, which could lead to new directions in efficient generative modeling. The results suggest promising pathways for employing velocity consistency in conjunction with pretrained models, presenting new venues for distillation techniques across various model hierarchies.

Future Directions

Several research avenues are posited for future exploration. The extension of Consistency-FM to more complex generative tasks, including but not limited to, text-to-image synthesis remains an alluring domain. Moreover, the potential for distilling pre-existing diffusion models (DMs) using the principles of Consistency-FM could transform how these models are leveraged in large-scale datasets.

In conclusion, Consistency-FM signifies a substantial methodological advancement in the landscape of generative modeling. By leveraging velocity consistency as a core design principle, it offers a pathway to achieve exceptional results in sample quality and computational efficiency, setting a new standard and exploring critical innovations in generative AI technology.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets