Extending Mean-Field Variational Inference via Entropic Regularization: Theory and Computation (2404.09113v2)
Abstract: Variational inference (VI) has emerged as a popular method for approximate inference for high-dimensional Bayesian models. In this paper, we propose a novel VI method that extends the naive mean field via entropic regularization, referred to as $\Xi$-variational inference ($\Xi$-VI). $\Xi$-VI has a close connection to the entropic optimal transport problem and benefits from the computationally efficient Sinkhorn algorithm. We show that $\Xi$-variational posteriors effectively recover the true posterior dependency, where the dependence is downweighted by the regularization parameter. We analyze the role of dimensionality of the parameter space on the accuracy of $\Xi$-variational approximation and how it affects computational considerations, providing a rough characterization of the statistical-computational trade-off in $\Xi$-VI. We also investigate the frequentist properties of $\Xi$-VI and establish results on consistency, asymptotic normality, high-dimensional asymptotics, and algorithmic stability. We provide sufficient criteria for achieving polynomial-time approximate inference using the method. Finally, we demonstrate the practical advantage of $\Xi$-VI over mean-field variational inference on simulated and real data.
- P. Alquier and J. Ridgway. Concentration of tempered posteriors and of their variational approximations. Annals of Statistics, 48:1475–1497, 6 2020. ISSN 21688966. doi: 10.1214/19-AOS1855.
- On the properties of variational approximations of Gibbs posteriors. Journal of Machine Learning Research, 17:1–41, 2016.
- Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. Advances in Neural Information Processing Systems, 30, 2017.
- J. M. Altschuler and E. Boix-Adsera. Hardness results for multimarginal optimal transport problems. Discrete Optimization, 42:100669, 2021.
- J. M. Altschuler and E. Boix-Adsera. Polynomial-time algorithms for multimarginal optimal transport problems with structure. Mathematical Programming, 199:1107–1178, 2023.
- Gradient Flows: In Metric Spaces and In the Space of Probability Measures. Springer Science & Business Media, 2005.
- A. Basak and S. Mukherjee. Universality of the mean-field for the Potts model. Probability Theory and Related Fields, 168:557–600, 2017.
- J. O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer Science & Business Media, 2013. ISBN 147574286X.
- Statistical and computational trade-offs in variational inference: A case study in inferential model selection. arXiv preprint arXiv:2207.11208, 2022.
- On the Bures–Wasserstein distance between positive definite matrices. Expositiones Mathematicae, 37:165–191, 2019.
- On the convergence of coordinate ascent variational inference. arXiv preprint arXiv:2306.01122, 2023.
- Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Annals of Statistics, 41:1922–1943, 8 2013. ISSN 00905364. doi: 10.1214/13-AOS1124.
- Variational inference: A review for statisticians. Journal of the American Statistical Association, 112:859–877, 2017.
- A. Braides. Local Minimization, Variational Evolution and Γnormal-Γ\Gammaroman_Γ-convergence, volume 2094. Springer, 2014.
- G. Carlier. On the linear convergence of the multimarginal Sinkhorn algorithm. SIAM Journal on Optimization, 32:786–794, 2022.
- Mean-field variational inference with the TAP free energy: Geometric and statistical properties in linear models. arXiv preprint arXiv:2311.08442, 2023a.
- Local convexity of the TAP free energy and AMP convergence for Z2subscript𝑍2Z_{2}italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-synchronization. Annals of Statistics, 51:519–546, 2023b.
- V. Chandrasekaran and M. I. Jordan. Computational and statistical tradeoffs via convex relaxation. Proceedings of the National Academy of Sciences, 110:E1181–E1190, 2013.
- S. Chewi. Log-Concave Sampling. draft, 2023.
- G. Conforti and L. Tamanini. A formula for the time derivative of the entropic cost and applications. Journal of Functional Analysis, 280:108964, 2021.
- I. Csiszár. I-divergence geometry of probability distributions and minimization problems. Annals of Probability, pages 146–158, 1975.
- M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in Neural Information Processing Systems, 26, 2013.
- Forward-backward Gaussian variational inference via JKO in the Bures-Wasserstein space. In International Conference on Machine Learning, pages 7960–7991. PMLR, 2023.
- S. Eckstein and M. Nutz. Quantitative stability of regularized optimal transport and convergence of Sinkhorn’s algorithm. SIAM Journal on Mathematical Analysis, 54:5922–5948, 2022.
- On the complexity of the optimal transport problem with graph-structured cost. International conference on artificial intelligence and statistics, pages 9147–9165, 2022.
- TAP free energy, spin glasses and variational inference. Annals of Probability, 49, 2021. ISSN 2168894X. doi: 10.1214/20-AOP1443.
- Bayesian Data Analysis. Chapman and Hall/CRC, 1995.
- An instability in variational inference for topic models. International Conference on Machine Learning, pages 2221–2231, 2019.
- S. Ghosal and A. van der Vaart. Fundamentals of Nonparametric Bayesian Inference, volume 44. Cambridge University Press, 2017.
- Covariances, robustness and variational Bayes. Journal of Machine Learning Research, 19, 2018.
- Theory of Gaussian variational approximation for a poisson mixed model. Statistica Sinica, pages 369–389, 2011a.
- Asymptotic normality and valid inference for Gaussian variational approximation. Annals of Statistics, 39:2502–2532, 2011b.
- Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space. arXiv preprint arXiv:2312.02849, 2023.
- An introduction to variational methods for graphical models. Machine learning, 37:183–233, 1999.
- A. Katsevich. Improved scaling with dimension in the Bernstein-von Mises theorem for two statistical models. arXiv preprint arXiv:2308.06899, 2023.
- A. Katsevich and P. Rigollet. On the approximation accuracy of Gaussian variational inference. arXiv preprint arXiv:2301.02168, 2023.
- On the convergence of black-box variational inference. Advances in Neural Information Processing Systems (to Appear), 36:1–2, 2023.
- On the complexity of approximating Wasserstein barycenters. International Conference on Machine Learning, pages 3530–3540, 2019.
- Mean field approximations via log-concavity. arXiv preprint arXiv:2206.01260, 2022.
- Variational inference via Wasserstein gradient flows. Advances in Neural Information Processing Systems, 35, 5 2022. ISSN 10495258. URL https://arxiv.org/abs/2205.15902v3.
- On the complexity of approximating multimarginal optimal transport. Journal of Machine Learning Research, 23:1–43, 2022.
- Boosting variational inference: an optimization perspective. International Conference on Artificial Intelligence and Statistics, 2018.
- Variational boosting: Iteratively refining posterior approximations. International Conference on Machine Learning, pages 2420–2429, 2017.
- Robust Bayesian inference via coarsening. Journal of the American Statistical Association, 2018.
- T. P. Minka. Expectation propagation for approximate Bayesian inference. arXiv preprint arXiv:1301.2294, 2013.
- A. Montanari and S. Sen. A short tutorial on mean-field spin glass techniques for non-physicists. arXiv preprint arXiv:2204.02909, 2022.
- S. Mukherjee and S. Sen. Variational inference in high-dimensional linear regression. Journal of Machine Learning Research, 23:13703–13758, 2022.
- A mean field approach to empirical Bayes estimation in high-dimensional linear regression. arXiv preprint arXiv:2309.16843, 2023.
- S. S. Mukherjee and P. Sarkar. Mean field for the stochastic blockmodel: Optimization landscape and convergence issues. Advances in Neural Information Processing Systems, 2018.
- M. Nutz. Introduction to Entropic Optimal Transport. Lecture notes, Columbia University, 2021.
- M. Opper and D. Saad. Advanced Mean Field Methods: Theory and Practice. MIT press, 2001. ISBN 0262150549.
- F. Otto and C. Villani. Generalization of an inequality by talagrand and links with the logarithmic sobolev inequality. Journal of Functional Analysis, 173:361–400, 2000.
- Critical dimension in the semiparametric Bernstein—von Mises theorem. Proceedings of the Steklov Institute of Mathematics, 287:232–255, 2014.
- Dynamics of coordinate ascent variational inference: A case study in 2d Ising models. Entropy, 22:1263, 2020.
- Y. Polyanskiy and Y. Wu. Information theory: From Coding to Learning. draft, 2023.
- J. Qiu. Sub-optimality of the naive mean field approximation for proportional high-dimensional linear regression. Advances in Neural Information Processing Systems, 36, 2024.
- Black box variational inference. Artificial intelligence and statistics, pages 814–822, 2014.
- Hierarchical variational models. International Conference on Machine Learning, pages 324–333, 2016.
- K. Ray and B. Szabó. Variational Bayes for high-dimensional linear regression with sparse priors. Journal of the American Statistical Association, pages 1–12, 2021.
- Spike and slab variational Bayes for high dimensional logistic regression. arXiv:2010.11665, 2020.
- J. Raymond and F. Ricci-Tersenghi. Improving variational methods via pairwise linear response identities. Journal of Machine Learning Research, 18, 2017. ISSN 15337928.
- C. Robert and G. Casella. Monte Carlo Statistical Methods. Springer-Verlag, 2004.
- F. Santambrogio. Optimal Transport for Applied Mathematicians, volume 55. Springer, 2015.
- Copula variational inference. Advances in Neural Information Processing Systems, 28, 2015.
- Hierarchical implicit models and likelihood-free variational inference. Advances in Neural Information Processing Systems, 30, 2017.
- R. Van Handel. Probability in high dimension. Lecture Notes (Princeton University), 2014.
- C. Villani. Optimal Transport: Old and New. Springer, Berlin, 2009.
- M. J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint, volume 48. Cambridge University Press, 2019.
- Graphical Models, Exponential Families, and Variational Inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.
- B. Wang and D. M. Titterington. Lack of consistency of mean field and variational Bayes approximations for state space models. Neural Processing Letters, 20:151–170, 2004.
- Statistical and computational trade-offs in estimation of sparse principal components. The Annals of Statistics, 44:1896–1930, 2016.
- Y. Wang and D. M. Blei. Frequentist consistency of variational Bayes. Journal of the American Statistical Association, 114:1147–1161, 2019.
- A generalized mean field algorithm for variational inference in exponential families. arXiv preprint arXiv:1212.2512, 2012.
- Z. Xu and T. Campbell. The computational asymptotics of Gaussian variational inference and the Laplace approximation. Statistics and Computing, 32, 2022. ISSN 15731375. doi: 10.1007/s11222-022-10125-y.
- J. Yan. Nonlinear large deviations: Beyond the hypercube. Annals of Applied Probability, 30:812–846, 2020. doi: 10.1214/19-AAP1516.
- α𝛼\alphaitalic_α-variational inference with statistical guarantees. Annals of Statistics, 48:886–905, 2020. doi: 10.1214/19-AOS1827.
- R. Yao and Y. Yang. Mean field variational inference via Wasserstein gradient flow. arXiv preprint arXiv:2207.08074, 2022.
- Theoretical and computational guarantees of mean field variational inference for community detection. Annals of Statistics, 48:2575–2598, 2020. ISSN 21688966. doi: 10.1214/19-AOS1898.
- F. Zhang and C. Gao. Convergence rates of variational posterior distributions. Annals of Statistics, 48:2180–2207, 2020.