Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Transformer Neural Autoregressive Flows (2401.01855v1)

Published 3 Jan 2024 in cs.LG

Abstract: Density estimation, a central problem in machine learning, can be performed using Normalizing Flows (NFs). NFs comprise a sequence of invertible transformations, that turn a complex target distribution into a simple one, by exploiting the change of variables theorem. Neural Autoregressive Flows (NAFs) and Block Neural Autoregressive Flows (B-NAFs) are arguably the most perfomant members of the NF family. However, they suffer scalability issues and training instability due to the constraints imposed on the network structure. In this paper, we propose a novel solution to these challenges by exploiting transformers to define a new class of neural flows called Transformer Neural Autoregressive Flows (T-NAFs). T-NAFs treat each dimension of a random variable as a separate input token, using attention masking to enforce an autoregressive constraint. We take an amortization-inspired approach where the transformer outputs the parameters of an invertible transformation. The experimental results demonstrate that T-NAFs consistently match or outperform NAFs and B-NAFs across multiple datasets from the UCI benchmark. Remarkably, T-NAFs achieve these results using an order of magnitude fewer parameters than previous approaches, without composing multiple flows.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Application of the back propagation neural network algorithm with monotonicity constraints for two-group classification problems. Decision Sciences, 1993.
  2. Monotone and partially monotone neural networks. IEEE Transactions on Neural Networks, 2010.
  3. Flashattention: Fast and memory-efficient exact attention with io-awareness. In Advances in Neural Information Processing Systems, 2022.
  4. Block neural autoregressive flow. In Uncertainty in Artificial Intelligence, 2020.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  6. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016.
  7. Invertible generative modeling using linear rational splines. In International Conference on Artificial Intelligence and Statistics, 2020.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  9. Neural spline flows. In Advances in Neural Information Processing Systems, 2019.
  10. Trade: Transformers for density estimation. arXiv preprint arXiv:2004.02441, 2020.
  11. Made: Masked autoencoder for distribution estimation. In International Conference on Machine Learning, 2015.
  12. Ffjord: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367, 2018.
  13. Neural autoregressive flows. In International Conference on Machine Learning, 2018.
  14. Sum-of-squares polynomial flow. In International Conference on Machine Learning, 2019.
  15. Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 2018.
  16. Improved variational inference with inverse autoregressive flow. Advances in Neural Information Processing Systems, 2016.
  17. Normalizing flows: Introduction and ideas. arXiv preprint arXiv:1908.09257, 2019.
  18. Efficient memory management for large language model serving with pagedattention. arXiv preprint arXiv:2309.06180, 2023.
  19. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In International Conference on Computer Vision, 2001.
  20. Neural importance sampling. CoRR, 2018.
  21. Transformation autoregressive networks. In International Conference on Machine Learning, 2018.
  22. Masked autoregressive flow for density estimation. In Advances in Neural Information Processing Systems, 2017.
  23. Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 2021.
  24. Variational inference with normalizing flows. In International Conference on Machine Learning, 2015.
  25. Joseph Sill. Monotonic networks. Advances in Neural Information Processing Systems, 1997.
  26. Attention is all you need. Advances in Neural Information Processing Systems, 2017.
  27. Unconstrained monotonic neural networks. In Advances in Neural Information Processing Systems, 2019.

Summary

We haven't generated a summary for this paper yet.