Bayesian Flow Networks (2308.07037v6)
Abstract: This paper introduces Bayesian Flow Networks (BFNs), a new class of generative model in which the parameters of a set of independent distributions are modified with Bayesian inference in the light of noisy data samples, then passed as input to a neural network that outputs a second, interdependent distribution. Starting from a simple prior and iteratively updating the two distributions yields a generative procedure similar to the reverse process of diffusion models; however it is conceptually simpler in that no forward process is required. Discrete and continuous-time loss functions are derived for continuous, discretised and discrete data, along with sample generation procedures. Notably, the network inputs for discrete data lie on the probability simplex, and are therefore natively differentiable, paving the way for gradient-based sample guidance and few-step generation in discrete domains such as LLMling. The loss function directly optimises data compression and places no restrictions on the network architecture. In our experiments BFNs achieve competitive log-likelihoods for image modelling on dynamically binarized MNIST and CIFAR-10, and outperform all known discrete diffusion models on the text8 character-level LLMling task.
- Structured Denoising Diffusion Models in Discrete State-Spaces. arXiv preprint arXiv:2107.03006, July 2021.
- Analog bits: Generating discrete data using diffusion models with self-conditioning. arXiv preprint arXiv:2208.04202, 2022.
- Rewon Child. Very deep vaes generalize autoregressive models and can outperform them on images. arXiv preprint arXiv:2011.10650, 2020.
- Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Continuous diffusion for categorical data. arXiv preprint arXiv:2211.15089, 2022.
- Jarek Duda. Asymmetric numeral systems. arXiv preprint arXiv:0902.0271, 2009.
- H.O. Georgii. Stochastics: Introduction to Probability and Statistics. De Gruyter textbook. Walter De Gruyter, 2008. ISBN 9783110191455. URL https://books.google.co.uk/books?id=ttJ5xpQX2MgC.
- Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
- Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
- Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the sixth annual conference on Computational learning theory, pages 5–13, 1993.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Autoregressive diffusion models. arXiv preprint arXiv:2110.02037, 2021a.
- Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions. In Advances in Neural Information Processing Systems, volume 34, pages 12454–12465. Curran Associates, Inc., 2021b.
- Locally masked convolution for autoregressive models. In Conference on Uncertainty in Artificial Intelligence, pages 1358–1367. PMLR, 2020.
- Soft truncation: A universal training technique of score-based diffusion model for high precision score estimation. arXiv preprint arXiv:2106.05527, 2021.
- Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
- MNIST handwritten digit database, 2010. URL http://yann.lecun.com/exdb/mnist/.
- Diffusion-lm improves controllable text generation. arXiv preprint arXiv:2205.14217, 2022.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Reflected diffusion models. arXiv preprint arXiv:2304.04740, 2023.
- Tess: Text-to-text self-conditioned simplex diffusion. arXiv preprint arXiv:2305.08379, 2023.
- Matt Mahoney. Large text compression benchmark., 2009. URL http://mattmahoney.net/dc/textdata.html.
- Generating high fidelity images with subscale pixel networks and multidimensional upscaling. arXiv preprint arXiv:1812.01608, 2018.
- Kevin Murphy. Conjugate bayesian analysis of the gaussian distribution. Technical report, University of British Columbia, 2007.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
- OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Image transformer. In International conference on machine learning, pages 4055–4064. PMLR, 2018.
- Language models are unsupervised multitask learners. Technical report, OpenAI, 2019.
- Variational inference with normalizing flows. In International conference on machine learning, pages 1530–1538. PMLR, 2015.
- Categorical SDEs with simplex diffusion. arXiv preprint arXiv:2210.14784, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- Pixelvae++: Improved pixelvae with discrete prior. arXiv preprint arXiv:1908.09948, 2019.
- On the quantitative analysis of deep belief networks. In Proceedings of the 25th international conference on Machine learning, pages 872–879. ACM, 2008.
- Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022.
- Pixelcnn++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:1701.05517, 2017.
- Training and inference on any-order autoregressive models the right way. Advances in Neural Information Processing Systems, 35:2762–2775, 2022.
- Consistency regularization for variational auto-encoders. Advances in Neural Information Processing Systems, 34:12943–12954, 2021.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Consistency models. arXiv preprint arXiv:2303.01469, 2023.
- Self-conditioned embedding diffusion for text generation. arXiv preprint arXiv:2211.04236, 2022.
- Adaptive Attention Span in Transformers. arXiv preprint arXiv:1905.07799, August 2019.
- Generating text with recurrent neural networks. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 1017–1024, 2011.
- Practical lossless compression with latent variables using bits back coding. arXiv preprint arXiv:1901.04866, 2019.
- Discrete flows: Invertible generative models of discrete data. Advances in Neural Information Processing Systems, 32, 2019.
- Nvae: A deep hierarchical variational autoencoder. Advances in neural information processing systems, 33:19667–19679, 2020.
- Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34:11287–11302, 2021.
- Chris S. Wallace. Classification by minimum-message-length inference. In International Conference on Computing and Information, 1991.
- Learning fast samplers for diffusion models by differentiating through sample quality. arXiv preprint arXiv:2202.05830, 2022.
- Arithmetic coding for data compression. Communications of the ACM, 30(6):520–540, 1987.
- Latent Normalizing Flows for Discrete Sequences. In Proceedings of the 36th International Conference on Machine Learning, pages 7673–7682. PMLR, May 2019.