Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
96 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
48 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations (2404.15766v2)

Published 24 Apr 2024 in cs.LG and cs.AI

Abstract: Bayesian flow networks (BFNs) iteratively refine the parameters, instead of the samples in diffusion models (DMs), of distributions at various noise levels through Bayesian inference. Owing to its differentiable nature, BFNs are promising in modeling both continuous and discrete data, while simultaneously maintaining fast sampling capabilities. This paper aims to understand and enhance BFNs by connecting them with DMs through stochastic differential equations (SDEs). We identify the linear SDEs corresponding to the noise-addition processes in BFNs, demonstrate that BFN's regression losses are aligned with denoise score matching, and validate the sampler in BFN as a first-order solver for the respective reverse-time SDE. Based on these findings and existing recipes of fast sampling in DMs, we propose specialized solvers for BFNs that markedly surpass the original BFN sampler in terms of sample quality with a limited number of function evaluations (e.g., 10) on both image and text datasets. Notably, our best sampler achieves an increase in speed of 5~20 times for free. Our code is available at https://github.com/ML-GSAI/BFN-Solver.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Anderson, B. D. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
  2. Numerical solution of ordinary differential equations, volume 81. John Wiley & Sons, 2009.
  3. Structured denoising diffusion models in discrete state-spaces, 2023.
  4. ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers, 2023.
  5. Estimating the optimal covariance with imperfect mean in diffusion probabilistic models. In International Conference on Machine Learning, pp.  1555–1584. PMLR, 2022a.
  6. Analytic-DPM: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. In International Conference on Learning Representations, 2022b.
  7. One transformer fits all distributions in multi-modal diffusion at scale. In International Conference on Machine Learning, pp.  1692–1717. PMLR, 2023.
  8. Text2live: Text-driven layered image and video editing. In European conference on computer vision, pp.  707–723. Springer, 2022.
  9. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  10. A continuous time framework for discrete denoising models, 2022.
  11. Wavegrad: Estimating gradients for waveform generation. arXiv preprint arXiv:2009.00713, 2020.
  12. Analog bits: Generating discrete data using diffusion models with self-conditioning. In The Eleventh International Conference on Learning Representations, 2022.
  13. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  14. Continuous diffusion for categorical data, 2022.
  15. Bayesian flow networks, 2023.
  16. Gaussian mixture solvers for diffusion models. arXiv preprint arXiv:2311.00941, 2023.
  17. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  18. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  19. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  20. Argmax flows and multinomial diffusion: Learning categorical distributions, 2021.
  21. Hyvärinen, A. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research (JMLR), 6(Apr):695–709, 2005.
  22. Gotta go fast when generating data with score-based models. arXiv preprint arXiv:2105.14080, 2021.
  23. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35:26565–26577, 2022.
  24. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  6007–6017, 2023.
  25. Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
  26. Stochastic differential equations. Springer, 1992.
  27. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761, 2020.
  28. Learning multiple layers of features from tiny images. 2009.
  29. Diffusion-lm improves controllable text generation, 2022.
  30. Pseudo numerical methods for diffusion models on manifolds. arXiv preprint arXiv:2202.09778, 2022.
  31. Reflected diffusion models. arXiv preprint arXiv:2304.04740, 2023.
  32. Discrete diffusion language modeling by estimating the ratios of the data distribution, 2023.
  33. Maximum likelihood training for score-based diffusion odes by high order denoising score matching. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp.  14429–14460. PMLR, 2022a. URL https://proceedings.mlr.press/v162/lu22f.html.
  34. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022b.
  35. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022c.
  36. Tess: Text-to-text self-conditioned simplex diffusion. arXiv preprint arXiv:2305.08379, 2023.
  37. Mahoney, M. Large text compression benchmark, 2011.
  38. Concrete score matching: Generalized score matching for discrete data, 2023.
  39. The blessing of randomness: Sde beats ode in general diffusion-based image editing. arXiv preprint arXiv:2311.01410, 2023.
  40. Øksendal, B. Stochastic differential equations. Springer, 2003.
  41. OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  42. Efficient learning of generative models via finite-difference score matching. Advances in Neural Information Processing Systems, 33:19175–19188, 2020.
  43. Swapping autoencoder for deep image manipulation. Advances in Neural Information Processing Systems, 33:7198–7211, 2020.
  44. Sdxl: improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
  45. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  46. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  47. Categorical sdes with simplex diffusion. arXiv preprint arXiv:2210.14784, 2022.
  48. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  10684–10695, 2022.
  49. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  50. Make-a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792, 2022.
  51. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp.  2256–2265. PMLR, 2015.
  52. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  53. Sliced score matching: A scalable approach to density and score estimation. In Conference on Uncertainty in Artificial Intelligence (UAI), 2019.
  54. Score-based generative modeling through stochastic differential equations, 2021.
  55. Score-based continuous-time discrete diffusion models, 2023.
  56. Vincent, P. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011.
  57. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023.
  58. Sa-solver: Stochastic adams solver for fast sampling of diffusion models. arXiv preprint arXiv:2309.05019, 2023a.
  59. Raphael: Text-to-image generation via large mixture of diffusion paths. arXiv preprint arXiv:2305.18295, 2023b.
  60. Dinoiser: Diffused conditional sequence learning by manipulating noises, 2023.
  61. gddim: Generalized denoising diffusion implicit models. arXiv preprint arXiv:2206.05564, 2022.
  62. Unipc: A unified predictor-corrector framework for fast sampling of diffusion models. arXiv preprint arXiv:2302.04867, 2023.
Citations (7)

Summary

We haven't generated a summary for this paper yet.