Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
96 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
48 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Bayesian Flow Networks (2308.07037v6)

Published 14 Aug 2023 in cs.LG and cs.AI

Abstract: This paper introduces Bayesian Flow Networks (BFNs), a new class of generative model in which the parameters of a set of independent distributions are modified with Bayesian inference in the light of noisy data samples, then passed as input to a neural network that outputs a second, interdependent distribution. Starting from a simple prior and iteratively updating the two distributions yields a generative procedure similar to the reverse process of diffusion models; however it is conceptually simpler in that no forward process is required. Discrete and continuous-time loss functions are derived for continuous, discretised and discrete data, along with sample generation procedures. Notably, the network inputs for discrete data lie on the probability simplex, and are therefore natively differentiable, paving the way for gradient-based sample guidance and few-step generation in discrete domains such as LLMling. The loss function directly optimises data compression and places no restrictions on the network architecture. In our experiments BFNs achieve competitive log-likelihoods for image modelling on dynamically binarized MNIST and CIFAR-10, and outperform all known discrete diffusion models on the text8 character-level LLMling task.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Structured Denoising Diffusion Models in Discrete State-Spaces. arXiv preprint arXiv:2107.03006, July 2021.
  2. Analog bits: Generating discrete data using diffusion models with self-conditioning. arXiv preprint arXiv:2208.04202, 2022.
  3. Rewon Child. Very deep vaes generalize autoregressive models and can outperform them on images. arXiv preprint arXiv:2011.10650, 2020.
  4. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019.
  5. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  6. Continuous diffusion for categorical data. arXiv preprint arXiv:2211.15089, 2022.
  7. Jarek Duda. Asymmetric numeral systems. arXiv preprint arXiv:0902.0271, 2009.
  8. H.O. Georgii. Stochastics: Introduction to Probability and Statistics. De Gruyter textbook. Walter De Gruyter, 2008. ISBN 9783110191455. URL https://books.google.co.uk/books?id=ttJ5xpQX2MgC.
  9. Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
  10. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
  11. Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the sixth annual conference on Computational learning theory, pages 5–13, 1993.
  12. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  13. Autoregressive diffusion models. arXiv preprint arXiv:2110.02037, 2021a.
  14. Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions. In Advances in Neural Information Processing Systems, volume 34, pages 12454–12465. Curran Associates, Inc., 2021b.
  15. Locally masked convolution for autoregressive models. In Conference on Uncertainty in Artificial Intelligence, pages 1358–1367. PMLR, 2020.
  16. Soft truncation: A universal training technique of score-based diffusion model for high precision score estimation. arXiv preprint arXiv:2106.05527, 2021.
  17. Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
  18. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  19. Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  20. MNIST handwritten digit database, 2010. URL http://yann.lecun.com/exdb/mnist/.
  21. Diffusion-lm improves controllable text generation. arXiv preprint arXiv:2205.14217, 2022.
  22. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  23. Reflected diffusion models. arXiv preprint arXiv:2304.04740, 2023.
  24. Tess: Text-to-text self-conditioned simplex diffusion. arXiv preprint arXiv:2305.08379, 2023.
  25. Matt Mahoney. Large text compression benchmark., 2009. URL http://mattmahoney.net/dc/textdata.html.
  26. Generating high fidelity images with subscale pixel networks and multidimensional upscaling. arXiv preprint arXiv:1812.01608, 2018.
  27. Kevin Murphy. Conjugate bayesian analysis of the gaussian distribution. Technical report, University of British Columbia, 2007.
  28. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  29. OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  30. Image transformer. In International conference on machine learning, pages 4055–4064. PMLR, 2018.
  31. Language models are unsupervised multitask learners. Technical report, OpenAI, 2019.
  32. Variational inference with normalizing flows. In International conference on machine learning, pages 1530–1538. PMLR, 2015.
  33. Categorical SDEs with simplex diffusion. arXiv preprint arXiv:2210.14784, 2022.
  34. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  35. Pixelvae++: Improved pixelvae with discrete prior. arXiv preprint arXiv:1908.09948, 2019.
  36. On the quantitative analysis of deep belief networks. In Proceedings of the 25th international conference on Machine learning, pages 872–879. ACM, 2008.
  37. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022.
  38. Pixelcnn++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:1701.05517, 2017.
  39. Training and inference on any-order autoregressive models the right way. Advances in Neural Information Processing Systems, 35:2762–2775, 2022.
  40. Consistency regularization for variational auto-encoders. Advances in Neural Information Processing Systems, 34:12943–12954, 2021.
  41. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  42. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  43. Consistency models. arXiv preprint arXiv:2303.01469, 2023.
  44. Self-conditioned embedding diffusion for text generation. arXiv preprint arXiv:2211.04236, 2022.
  45. Adaptive Attention Span in Transformers. arXiv preprint arXiv:1905.07799, August 2019.
  46. Generating text with recurrent neural networks. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 1017–1024, 2011.
  47. Practical lossless compression with latent variables using bits back coding. arXiv preprint arXiv:1901.04866, 2019.
  48. Discrete flows: Invertible generative models of discrete data. Advances in Neural Information Processing Systems, 32, 2019.
  49. Nvae: A deep hierarchical variational autoencoder. Advances in neural information processing systems, 33:19667–19679, 2020.
  50. Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34:11287–11302, 2021.
  51. Chris S. Wallace. Classification by minimum-message-length inference. In International Conference on Computing and Information, 1991.
  52. Learning fast samplers for diffusion models by differentiating through sample quality. arXiv preprint arXiv:2202.05830, 2022.
  53. Arithmetic coding for data compression. Communications of the ACM, 30(6):520–540, 1987.
  54. Latent Normalizing Flows for Discrete Sequences. In Proceedings of the 36th International Conference on Machine Learning, pages 7673–7682. PMLR, May 2019.
Citations (24)

Summary

  • The paper presents a novel approach that combines Bayesian inference with neural networks to iteratively update data distributions without a forward diffusion process.
  • It introduces a unified framework capable of handling continuous and discrete data, achieving competitive log-likelihoods on benchmarks like CIFAR-10 and MNIST.
  • The study demonstrates that Bayesian updates yield efficient gradient-based guidance and streamlined model training, paving the way for advanced generative tasks.

Overview of Bayesian Flow Networks

The paper introduces Bayesian Flow Networks (BFNs), a novel type of generative model that integrates Bayesian inference with neural networks to iteratively model complex data distributions. In this framework, Bayesian inference updates the parameter distributions based on the noise observed in the data samples, and these updated parameters serve as inputs to a neural network, which outputs another set of interdependent distributions. This process is conceptually akin to the reverse phase of diffusion models, although it avoids the necessity for a forward diffusion process.

Key Contributions

BFNs present several innovations in generative modeling:

  • Unified Treatment of Data Types: The model is capable of handling continuous, discretised, and discrete data using a uniform Bayesian framework, facilitating the modeling of varied data types within a single coherent approach.
  • Continuous Transmission Process: For discrete data, BFNs ensure that network inputs lie on the probability simplex, which allows for gradient-based guidance of sample generation and few-step sample generation within discrete domains like LLMing.
  • Optimization and Loss Functionality: The BFNs leverage a loss function that directly aligns with data compression, thereby optimizing the likelihood estimation of data without enforcing restrictive constraints on the network architecture. This results in models with competitive log-likelihoods for image modeling on CIFAR-10 and dynamically binarized MNIST and significant advancements over discrete diffusion models in the text8 character-level LLMing challenge.

Methodology

BFNs employ a Bayesian update mechanism where the input distribution, initially simple priors, are iteratively updated with samples through a Bayesian process. The parameters of the input distribution, after Bayesian updates, are processed by a neural network. The network outputs parameters for an output distribution, which inform the subsequent Bayesian update and parameter setting:

  • Input and Output Distribution Dynamics: The inputs are Bayesian updated parameters, while output predictions are network-driven. This decoupling allows the input to refine predictions of individual variables via Bayesian updates, while the network output utilizes contextual information.
  • Derivation of Loss Functions: The authors derive loss functions for both discrete-time and continuous-time scenarios. They ensure that parameters evolve smoothly over time through a Bayesian flow, optimizing the flow to make input distributions highly informative about the data.

Results and Evaluation

On widely used generative benchmarks, BFNs deliver strong results:

  • Competitiveness and Efficiency: On dynamically binarized MNIST, the BFN models achieve impressive test set log-likelihoods close to state-of-the-art values without any form of data augmentation. On CIFAR-10, they approach leading results from variational diffusion models with substantially fewer training updates.
  • Discrete Data on the text8 Dataset: The BFN attains significant improvements over current diffusion models for discrete data, although state-of-the-art results are achieved by order-agnostic models.

Implications and Future Directions

The paper proposes a fluent integration of Bayesian methods into neural network generative models, which naturally accommodates different data types and supports fewer-step synthesis, potentially simplifying architecture and training processes across varied settings. The innovative use of continuous transmission processes in BFN models sets the stage for improved gradient-based techniques and potentially more efficient data compression schemes.

The framework described in this paper could be a precursor to new generative models that leverage network structures more efficiently, raising the question of whether Bayesian-guided network learning can supersede traditional autoregressive methods, particularly in domains such as image generation where no natural ordering exists. Future work may focus on refining the input distribution dynamics and optimizing inference processes to further advance the expressiveness and efficacy of Bayesian Flow Networks.

Youtube Logo Streamline Icon: https://streamlinehq.com