Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Consistency Diffusion Bridge Models (2410.22637v2)

Published 30 Oct 2024 in cs.LG and cs.CV

Abstract: Diffusion models (DMs) have become the dominant paradigm of generative modeling in a variety of domains by learning stochastic processes from noise to data. Recently, diffusion denoising bridge models (DDBMs), a new formulation of generative modeling that builds stochastic processes between fixed data endpoints based on a reference diffusion process, have achieved empirical success across tasks with coupled data distribution, such as image-to-image translation. However, DDBM's sampling process typically requires hundreds of network evaluations to achieve decent performance, which may impede their practical deployment due to high computational demands. In this work, inspired by the recent advance of consistency models in DMs, we tackle this problem by learning the consistency function of the probability-flow ordinary differential equation (PF-ODE) of DDBMs, which directly predicts the solution at a starting step given any point on the ODE trajectory. Based on a dedicated general-form ODE solver, we propose two paradigms: consistency bridge distillation and consistency bridge training, which is flexible to apply on DDBMs with broad design choices. Experimental results show that our proposed method could sample $4\times$ to $50\times$ faster than the base DDBM and produce better visual quality given the same step in various tasks with pixel resolution ranging from $64 \times 64$ to $256 \times 256$, as well as supporting downstream tasks such as semantic interpolation in the data space.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Brian DO Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
  2. Vidu: a highly consistent, dynamic and skilled text-to-video generator with diffusion models. arXiv preprint arXiv:2405.04233, 2024.
  3. A note on the inception score. arXiv preprint arXiv:1801.01973, 2018.
  4. A class of explicit multistep exponential integrators for semilinear problems. Numerische Mathematik, 102:367–381, 2006.
  5. Wavegrad: Estimating gradients for waveform generation. In International Conference on Learning Representations, 2021.
  6. Schrodinger bridges beat diffusion models on text-to-speech synthesis. arXiv preprint arXiv:2312.03491, 2023.
  7. Augmented bridge matching. arXiv preprint arXiv:2311.06978, 2023.
  8. Diffusion schrödinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems, 34:17695–17709, 2021.
  9. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255. IEEE, 2009.
  10. Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems, volume 34, pages 8780–8794, 2021.
  11. Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014.
  12. Density estimation using real nvp. In International Conference on Learning Representations, 2016.
  13. Joseph L Doob and JI Doob. Classical potential theory and its probabilistic counterpart, volume 262. Springer, 1984.
  14. Scaling rectified flow transformers for high-resolution image synthesis. arXiv preprint arXiv:2403.03206, 2024.
  15. Consistency models made easy. arXiv preprint arXiv:2406.14548, 2024.
  16. Seeds: Exponential sde solvers for fast high-quality sampling from diffusion models. arXiv preprint arXiv:2305.14267, 2023.
  17. Generative adversarial nets. In Advances in Neural Information Processing Systems, volume 27, pages 2672–2680, 2014.
  18. Photorealistic video generation with diffusion models. arXiv preprint arXiv:2312.06662, 2023.
  19. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems, volume 30, pages 6626–6637, 2017.
  20. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  21. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pages 6840–6851, 2020.
  22. Exponential rosenbrock-type methods. SIAM Journal on Numerical Analysis, 47(1):786–803, 2009.
  23. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
  24. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems, 2022.
  25. Denoising diffusion restoration models. In Advances in Neural Information Processing Systems, 2022.
  26. Consistency trajectory models: Learning probability flow ODE trajectory of diffusion. In The Twelfth International Conference on Learning Representations, 2024.
  27. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  28. Variational diffusion models. In Advances in Neural Information Processing Systems, 2021.
  29. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.
  30. Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31, 2018.
  31. Bidirectional consistency models. arXiv preprint arXiv:2403.18035, 2024.
  32. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2022.
  33. I2sb: Image-to-image schrödinger bridge. In International Conference on Machine Learning, pages 22042–22062. PMLR, 2023.
  34. On the variance of the adaptive learning rate and beyond. In International Conference on Learning Representations, 2019.
  35. Flow straight and fast: Learning to generate and transfer data with rectified flow. In The Eleventh International Conference on Learning Representations, 2022.
  36. Learning diffusion bridges on constrained domains. In The Eleventh International Conference on Learning Representations, 2023.
  37. Maximum likelihood training for score-based diffusion odes by high order denoising score matching. In International Conference on Machine Learning, pages 14429–14460. PMLR, 2022.
  38. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. In Advances in Neural Information Processing Systems, 2022.
  39. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022.
  40. Latent consistency models: Synthesizing high-resolution images with few-step inference. arXiv preprint arXiv:2310.04378, 2023.
  41. SDEdit: Image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2022.
  42. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, pages 16784–16804. PMLR, 2022.
  43. Stefano Peluchetti. Diffusion bridge mixture transports, schrödinger bridge problems and generative modeling. Journal of Machine Learning Research, 24(374):1–51, 2023.
  44. Stefano Peluchetti. Non-denoising forward-time diffusions. arXiv preprint arXiv:2312.14589, 2023.
  45. Diffusion-based voice conversion with fast maximum likelihood sampling scheme. In International Conference on Learning Representations, 2022.
  46. Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning, pages 1278–1286. PMLR, 2014.
  47. Diffusions, Markov processes and martingales: Volume 2, Itô calculus, volume 2. Cambridge university press, 2000.
  48. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  49. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022.
  50. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems, 2022.
  51. Diffusion schrödinger bridge matching. Advances in Neural Information Processing Systems, 36, 2024.
  52. Conditional simulation using diffusion schrödinger bridges. In Uncertainty in Artificial Intelligence, pages 1792–1802. PMLR, 2022.
  53. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
  54. Aligned diffusion schrödinger bridges. In Uncertainty in Artificial Intelligence, pages 1985–1995. PMLR, 2023.
  55. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021.
  56. Pseudoinverse-guided diffusion models for inverse problems. In International Conference on Learning Representations, 2023.
  57. Improved techniques for training consistency models. In The Twelfth International Conference on Learning Representations, 2024.
  58. Consistency models. In International Conference on Machine Learning, pages 32211–32252. PMLR, 2023.
  59. Maximum likelihood training of score-based diffusion models. In Advances in Neural Information Processing Systems, volume 34, pages 1415–1428, 2021.
  60. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  61. Dual diffusion implicit bridges for image-to-image translation. arXiv preprint arXiv:2203.08382, 2022.
  62. Diode: A dense indoor and outdoor depth dataset. arXiv preprint arXiv:1908.00463, 2019.
  63. Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011.
  64. Videolcm: Video latent consistency model. arXiv preprint arXiv:2312.09109, 2023.
  65. Zero-shot image restoration using denoising diffusion null-space model. In The Eleventh International Conference on Learning Representations, 2023.
  66. Framebridge: Improving image-to-video generation with bridge models. arXiv preprint arXiv:2410.15371, 2024.
  67. Fast sampling of diffusion models with exponential integrator. In The Eleventh International Conference on Learning Representations, 2023.
  68. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  69. Diffusion bridge implicit models. arXiv preprint arXiv:2405.15885, 2024.
  70. Dpm-solver-v3: Improved diffusion ode solver with empirical model statistics. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  71. Improved techniques for maximum likelihood estimation for diffusion odes. In International Conference on Machine Learning, pages 42363–42389. PMLR, 2023.
  72. Denoising diffusion bridge models. arXiv preprint arXiv:2309.16948, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Guande He (13 papers)
  2. Kaiwen Zheng (48 papers)
  3. Jianfei Chen (63 papers)
  4. Fan Bao (30 papers)
  5. Jun Zhu (424 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets