Stable Training of Normalizing Flows for High-dimensional Variational Inference (2402.16408v1)
Abstract: Variational inference with normalizing flows (NFs) is an increasingly popular alternative to MCMC methods. In particular, NFs based on coupling layers (Real NVPs) are frequently used due to their good empirical performance. In theory, increasing the depth of normalizing flows should lead to more accurate posterior approximations. However, in practice, training deep normalizing flows for approximating high-dimensional posterior distributions is often infeasible due to the high variance of the stochastic gradients. In this work, we show that previous methods for stabilizing the variance of stochastic gradient descent can be insufficient to achieve stable training of Real NVPs. As the source of the problem, we identify that, during training, samples often exhibit unusual high values. As a remedy, we propose a combination of two methods: (1) soft-thresholding of the scale in Real NVPs, and (2) a bijective soft log transformation of the samples. We evaluate these and other previously proposed modification on several challenging target distributions, including a high-dimensional horseshoe logistic regression model. Our experiments show that with our modifications, stable training of Real NVPs for posteriors with several thousand dimensions is possible, allowing for more accurate marginal likelihood estimation via importance sampling. Moreover, we evaluate several common training techniques and architecture choices and provide practical advise for training NFs for high-dimensional variational inference.
- Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12):6745–6750, 1999.
- Daniel Andrade. Loft-stable training of normalizing flows for variational inference. In The 6th Workshop on Tractable Probabilistic Modeling, 2023.
- Annealed flow transport monte carlo. In International Conference on Machine Learning, pages 318–330. PMLR, 2021.
- Guided image generation with conditional invertible neural networks. arXiv preprint arXiv:1907.02392, 2019.
- Understanding and mitigating exploding inverses in invertible neural networks. In International Conference on Artificial Intelligence and Statistics, pages 1792–1800. PMLR, 2021.
- Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017.
- Adaptation of the independent metropolis-hastings sampler with normalizing flow proposals. In International Conference on Artificial Intelligence and Statistics, pages 5949–5986. PMLR, 2022.
- The horseshoe estimator for sparse signals. Biometrika, 97(2):465–480, 2010.
- The practical implementation of bayesian model selection. Lecture Notes-Monograph Series, pages 65–134, 2001.
- An invitation to sequential monte carlo samplers. Journal of the American Statistical Association, 117(539):1587–1600, 2022.
- Robust, accurate stochastic optimization for variational inference. Advances in Neural Information Processing Systems, 33:10961–10973, 2020.
- Challenges and opportunities in high dimensional variational inference. Advances in Neural Information Processing Systems, 34:7787–7798, 2021.
- Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016.
- Stabilizing invertible neural networks using mixture models. Inverse Problems, 37(8):085002, 2021.
- The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. J. Mach. Learn. Res., 15(1):1593–1623, 2014.
- Augmented normalizing flows: Bridging the gap between generative flows and latent variable models. arXiv preprint arXiv:2002.07101, 2020.
- Universal approximation property of invertible neural networks. Journal of Machine Learning Research, 24(287):1–68, 2023.
- Tails of lipschitz triangular flows. In International Conference on Machine Learning, pages 4673–4681. PMLR, 2020.
- Variational refinement for importance sampling using the forward kullback-leibler divergence. arXiv preprint arXiv:2106.15980, 2021.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- On the assessment of monte carlo error in simulation-based statistical analyses. The American Statistician, 63(2):155–162, 2009.
- Representational aspects of depth and conditioning in normalizing flows. In International Conference on Machine Learning, pages 5628–5636. PMLR, 2021.
- Automatic differentiation variational inference. Journal of machine learning research, 2017.
- Universal approximation using well-conditioned normalizing flows. Advances in Neural Information Processing Systems, 34:12700–12711, 2021.
- Fat–tailed variational inference with anisotropic tail adaptive flows. In International Conference on Machine Learning, pages 13257–13270. PMLR, 2022.
- Continual repeated annealed flow transport monte carlo. In International Conference on Machine Learning, pages 15196–15219. PMLR, 2022.
- Radford M Neal. Annealed importance sampling. Statistics and computing, 11:125–139, 2001.
- Radford M Neal. Slice sampling. The annals of statistics, 31(3):705–767, 2003.
- Radford M Neal et al. Mcmc using hamiltonian dynamics. Handbook of markov chain monte carlo, 2(11):2, 2011.
- Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 22(57):1–64, 2021.
- Automatic differentiation in pytorch. 2017.
- Composable effects for flexible and accelerated probabilistic programming in numpyro. arXiv preprint arXiv:1912.11554, 2019.
- Black box variational inference. In Artificial intelligence and statistics, pages 814–822. PMLR, 2014.
- Variational inference with normalizing flows. In International conference on machine learning, pages 1530–1538. PMLR, 2015.
- Sticking the landing: Simple, lower-variance gradient estimators for variational inference. Advances in Neural Information Processing Systems, 30, 2017.
- Can push-forward generative models fit multimodal distributions? Advances in Neural Information Processing Systems, 35:10766–10779, 2022.
- normflows: A PyTorch Package for Normalizing Flows. arXiv preprint arXiv:2302.12014, 2023.
- Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
- Gradients should stay on path: better estimators of the reverse-and forward kl divergence for normalizing flows. Machine Learning: Science and Technology, 3(4):045006, 2022.
- Machine learning and the future of bayesian computation. arXiv preprint arXiv:2304.11251, 2023.
- Advances in variational inference. IEEE transactions on pattern analysis and machine intelligence, 41(8):2008–2026, 2018.