Semi-Autoregressive Energy Flows: Exploring Likelihood-Free Training of Normalizing Flows (2206.06672v2)
Abstract: Training normalizing flow generative models can be challenging due to the need to calculate computationally expensive determinants of Jacobians. This paper studies the likelihood-free training of flows and proposes the energy objective, an alternative sample-based loss based on proper scoring rules. The energy objective is determinant-free and supports flexible model architectures that are not easily compatible with maximum likelihood training, including semi-autoregressive energy flows, a novel model family that interpolates between fully autoregressive and non-autoregressive models. Energy flows feature competitive sample quality, posterior inference, and generation speed relative to likelihood-based flows; this performance is decorrelated from the quality of log-likelihood estimates, which are generally very poor. Our findings question the use of maximum likelihood as an objective or a metric, and contribute to a scientific study of its role in generative modeling.
- Wasserstein gan, 2017. URL https://arxiv.org/abs/1701.07875.
- Invertible residual networks, 2019.
- The cramer distance as a solution to biased wasserstein gradients, 2017.
- Rectangular flows for manifold learning, 2021.
- Chollet, F. et al. Keras, 2015. URL https://github.com/fchollet/keras.
- Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015.
- A change of variables method for rectangular matrix-vector products. In Banerjee, A. and Fukumizu, K. (eds.), Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pp. 2755–2763. PMLR, 13–15 Apr 2021. URL https://proceedings.mlr.press/v130/cunningham21a.html.
- Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
- Density estimation using real nvp, 2017.
- Invertible generative modeling using linear rational splines, 2020.
- Uci machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
- Dudley, R. M. Weak convergence of probabilities on nonseparable metric spaces and empirical measures on Euclidean spaces. Illinois Journal of Mathematics, 10(1):109 – 126, 1966. doi: 10.1215/ijm/1256055206. URL https://doi.org/10.1215/ijm/1256055206.
- Neural spline flows, 2019.
- Training generative neural networks via maximum mean discrepancy optimization, 2015.
- Convergence de la répartition empirique vers la répartition théorique. Annales scientifiques de l’École Normale Supérieure, 3e série, 70(3):267–285, 1953. doi: 10.24033/asens.1013. URL http://www.numdam.org/articles/10.24033/asens.1013/.
- Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007a.
- Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359–378, 2007b.
- Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(2):243–268, 2007.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Ffjord: Free-form continuous dynamics for scalable reversible generative models, 2018. URL https://arxiv.org/abs/1810.01367.
- A kernel method for the two-sample problem, 2008.
- Flow-gan: Combining maximum likelihood and adversarial learning in generative models. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), Apr. 2018. doi: 10.1609/aaai.v32i1.11829. URL https://ojs.aaai.org/index.php/AAAI/article/view/11829.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Neural autoregressive flows. In International Conference on Machine Learning, pp. 2078–2087. PMLR, 2018.
- On a space of totally additive functions. Vestnik Leningrad. Univ, 13:52–59, 1958.
- Kantorovich, L. V. Mathematical methods of organizing and planning production. Management science, 6(4):366–422, 1960.
- Self normalizing flows. In International Conference on Machine Learning, pp. 5378–5387. PMLR, 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Glow: Generative flow with invertible 1x1 convolutions, 2018.
- Auto-encoding variational bayes, 2013.
- Stochastic gradient vb and the variational auto-encoder. In Second International Conference on Learning Representations, ICLR, 2014.
- Improved variational inference with inverse autoregressive flow. Advances in neural information processing systems, 29:4743–4751, 2016.
- Generalized sliced wasserstein distances. CoRR, abs/1902.00434, 2019. URL http://arxiv.org/abs/1902.00434.
- Learning multiple layers of features from tiny images. 2009.
- Generative moment matching networks, 2015.
- Flow matching for generative modeling. arXiv preprint arXiv:2210.02747, 2022.
- Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
- Müller, A. Integral probability metrics and their generating classes of functions. Advances in Applied Probability, 29(2):429–443, 1997. ISSN 00018678. URL http://www.jstor.org/stable/1428011.
- Neural importance sampling, 2019.
- Hybrid models with deep and invertible features, 2019.
- Distributional sliced-wasserstein and applications to generative modeling, 2020. URL https://arxiv.org/abs/2002.07367.
- Survae flows: Surjections to bridge the gap between vaes and flows. Advances in Neural Information Processing Systems, 33:12685–12696, 2020.
- Conditional image generation with pixelcnn decoders, 2016. URL https://arxiv.org/abs/1606.05328.
- Masked autoregressive flow for density estimation. arXiv preprint arXiv:1705.07057, 2017.
- Normalizing flows for probabilistic modeling and inference. arXiv preprint arXiv:1912.02762, 2019.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Variational inference with normalizing flows, 2015.
- Equivalence of distance-based and rkhs-based statistics in hypothesis testing. The Annals of Statistics, pp. 2263–2291, 2013.
- Autoregressive quantile flows for predictive uncertainty estimation. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=z1-I6rOKv1S.
- Sliced score matching: A scalable approach to density and score estimation, 2019. URL https://arxiv.org/abs/1905.07088.
- A note on integral probability metrics and $\phi$-divergences. CoRR, abs/0901.2698, 2009. URL http://arxiv.org/abs/0901.2698.
- Székely, G. J. E-statistics: The energy of statistical samples. Bowling Green State University, Department of Mathematics and Statistics Technical Report, 3(05):1–18, 2003.
- Wavenet: A generative model for raw audio, 2016.
- Unconstrained monotonic neural networks, 2021.
- A tale of two flows: Cooperative learning of langevin flow and normalizing flow toward energy-based model, 2022.