Variational Inference for Uncertainty Quantification: an Analysis of Trade-offs (2403.13748v2)
Abstract: Given an intractable distribution $p$, the problem of variational inference (VI) is to find the best approximation from some more tractable family $Q$. Commonly, one chooses $Q$ to be a family of factorized distributions (i.e., the mean-field assumption), even though~$p$ itself does not factorize. We show that this mismatch leads to an impossibility theorem: if $p$ does not factorize, then any factorized approximation $q\in Q$ can correctly estimate at most one of the following three measures of uncertainty: (i) the marginal variances, (ii) the marginal precisions, or (iii) the generalized variance (which can be related to the entropy). In practice, the best variational approximation in $Q$ is found by minimizing some divergence $D(q,p)$ between distributions, and so we ask: how does the choice of divergence determine which measure of uncertainty, if any, is correctly estimated by VI? We consider the classic Kullback-Leibler divergences, the more general R\'enyi divergences, and a score-based divergence which compares $\nabla \log p$ and $\nabla \log q$. We provide a thorough theoretical analysis in the setting where $p$ is a Gaussian and $q$ is a (factorized) Gaussian. We show that all the considered divergences can be \textit{ordered} based on the estimates of uncertainty they yield as objective functions for~VI. Finally, we empirically evaluate the validity of this ordering when the target distribution $p$ is not Gaussian.
- Bernardo, J. M. and A. F. M. Smith (2000). Bayesian Theory. Wiley.
- Bounding Wasserstein distance with couplings. Journal of the American Statistical Association. DOI: 10.1080/01621459.2023.2287773.
- Variational inference: A review for statisticians. Journal of the American Statistical Association 112, 859–877.
- JAX: composable transformations of Python+NumPy programs.
- Burbea, J. (1984). The convexity with respect to Gaussian distributions of divergences of order α𝛼\alphaitalic_α. Utilitas Mathematica 26, 171–192.
- Batch and match: black-box variational inference with a score-based divergence. arXiv:2402.14758.
- Cichocki, A. and S.-i. Amari (2010). Families of alpha- beta- and gamma-divergences: Flexible and robust measures of similarities. Entropy 12, 1532–1568.
- Courtade, T. A. (2016). Links between the logarithmic Sobolev inequality and the convolution inequalities for entropy and Fisher information. arXiv:1608.05431.
- Cover, T. M. and J. A. Thomas (2006). Elements of Information Theory. John Wiley & Sons, Inc.
- Alpha-divergence variational inference meets importance weighted auto-encoders: Methodology and asymptotics. Journal of Machine Learning Research 24, 1–83.
- Infinite-dimensional gradient-based descent for alpha-divergence minimisation. The Annals of Statistics 49, 2250–2270.
- Challenges and opportunities in high-dimensional variational inference. In Advances in Neural Information Processing Systems 34, pp. 7787–7798.
- Variational inference via χ𝜒\chiitalic_χ upper bound minimization. In Advances in Neural Information Processing Systems 30, pp. 2732–2741.
- UCL machine learning repository.
- On the difficulty of unbiased alpha divergence minimization. In Proceedings of the 38th International Conference on Machine Learning, pp. 3650–3659.
- Bayesian Data Analysis. Chapman & Hall/CRC Texts in Statistical Science.
- Data Analysis Using Regression and Multilevel-Hierarchical Models. Cambridge University Press.
- Rényi divergence measures for commonly used univariate continuous distributions. Information Sciences 249, 124–131.
- Covariances, robustness, and variational bayes. Journal of Machine Learning Research 19, 1–49.
- bridgesampling: An R package for estimating normalizing constants. Journal of Statistical Software 92.
- Black-box alpha divergence minimization. In Proceedings of the 33rd International Conference on Machine Learning, pp. 1511–1520.
- Statistical Methodology 6, 424–436.
- Hoffman, M. D. and D. M. Blei (2015). Stochastic structured variational inference. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, pp. 361–369.
- Horn, R. A. and C. R. Johnson (2012). Matrix analysis. Cambridge University Press.
- Validated variational inference via practical posterior error bounds. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, pp. 1792–1802.
- Composing graphical models with neural networks for structured representations and fast inference. In Advances in neural information processing systems 29, pp. 2946–2954.
- An introduction to variational methods for graphical models. Machine Learning 37, 183–233.
- Stochastic volatility: likelihood inference and comparison with arch models. The Review of Economic Studies 65, 361–393.
- Automatic differentiation variational inference. Journal of Machine Learning Research 18, 1–45.
- Li, Y. and R. E. Turner (2016). Rényi divergence variational inference. In Advances in Neural Information Processing Systems 29, pp. 1073–1081.
- Convex Statistical Distances. Leipzig: Teubner.
- MacKay, D. J. (2003). Information theory, inference, and learning algorithms.
- Margossian, C. C. and L. K. Saul (2023). The shrinkage-delinkage trade-off: An analysis of factorized gaussian approximations for variational inference. In Proceedings of the 39th Conference on Uncertainty in Artificial Intelligence, pp. 1358–1367.
- Warp bridge sampling. Journal of Computational and Graphical Statistics 11, 552–586.
- Minka, T. (2005). Divergence measures and message passing. Technical Report MSR-TR-2005-173.
- Nordström, K. (2011). Convexity of the inverse and Moore-Penrose inverse. Linear Algebra and Its Applications 434, 1489–1512.
- The variational gaussian approximation revisited. Neural Computation 21(3), 786–792.
- Parisi, G. (1988). Statistical Field Theory. Addison-Wesley.
- Monte Carlo Statistical Methods. Springer.
- Rosenbrock, H. H. (1960). An automatic method for finding the greatest or least value of a function. Computer Journal 3, 175–184.
- Rubin, D. B. (1981). Estimation in parallelized randomized experiments. Journal of Educational Statistics 6, 377–400.
- Inference gym.
- Tomczak, J. M. (2022). Deep generative modeling. Springer.
- Two problems with variational expectation maximisation for time-series models. In D. Barber, A. T. Cemgil, and S. Chiappa (Eds.), Bayesian Time series models, Chapter 5, pp. 109–130. Cambridge University Press.
- Expectation propagation as a way of life: A framework for bayesian inference on partitioned data. Journal of Machine Learning 21, 1–53.
- Pareto smoothed importance sampling. Journal of Machine Learning Research. To appear.
- Wainwright, M. J. and M. I. Jordan (2008). Foundations and Trends in Machine Learning 1(1–2), 1–305.
- Wang, Y. and D. M. Blei (2018). Frequentist consistency of variational bayes. Journal of the American Statistical Association 114, 1147–1161.
- Yes, but did it work?: Evaluating variational inference. In Proceedings of the 35th International Conference on Machine Learning, pp. 5577–5586.
- Pathfinder: Parallel quasi-Newton variational inference. Journal of Machine Learning Research 23(306), 1–49.
- Charles C. Margossian (20 papers)
- Loucas Pillaud-Vivien (19 papers)
- Lawrence K. Saul (9 papers)