Convergence of coordinate ascent variational inference for log-concave measures via optimal transport (2404.08792v1)
Abstract: Mean field variational inference (VI) is the problem of finding the closest product (factorized) measure, in the sense of relative entropy, to a given high-dimensional probability measure $\rho$. The well known Coordinate Ascent Variational Inference (CAVI) algorithm aims to approximate this product measure by iteratively optimizing over one coordinate (factor) at a time, which can be done explicitly. Despite its popularity, the convergence of CAVI remains poorly understood. In this paper, we prove the convergence of CAVI for log-concave densities $\rho$. If additionally $\log \rho$ has Lipschitz gradient, we find a linear rate of convergence, and if also $\rho$ is strongly log-concave, we find an exponential rate. Our analysis starts from the observation that mean field VI, while notoriously non-convex in the usual sense, is in fact displacement convex in the sense of optimal transport when $\rho$ is log-concave. This allows us to adapt techniques from the optimization literature on coordinate descent algorithms in Euclidean space.
- Gradient Flows. Lectures in Mathematics. ETH Zürich. Birkhauser Verlag AG, Basel, Switzerland, 2 edition, Dec. 2008.
- A. Beck. First-order methods in optimization. SIAM, 2017.
- D. P. Bertsekas. Nonlinear Programming. Athena Scientific, 1997.
- On the convergence of coordinate ascent variational inference, 2023.
- Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
- A. Budhiraja and P. Dupuis. Analysis and approximation of rare events. Representations and Weak Convergence Methods. Series Prob. Theory and Stoch. Modelling, 94, 2019.
- S. Chewi. Log-concave sampling. Book draft available at https://chewisinho.github.io, 2024.
- M. Cule and R. Samworth. Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density. Electronic Journal of Statistics, 4(none):254 – 270, 2010.
- Analysis of Langevin Monte Carlo via convex optimization. The Journal of Machine Learning Research, 20(1):2666–2711, 2019.
- N. Gozlan and C. Léonard. Transport inequalities. a survey. arXiv preprint arXiv:1003.3852, 2010.
- Iteration complexity analysis of block coordinate descent methods. Mathematical Programming, 163(1–2):85–114, Aug. 2016.
- Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space. arXiv preprint 2312.02849, 2023.
- D. Lacker. Independent projections of diffusions: Gradient flows for variational inference and optimal mean field approximations, 2023.
- Mean Field Approximations via Log-Concavity. International Mathematics Research Notices, page rnad302, 12 2023.
- Variational inference via Wasserstein gradient flows. Advances in Neural Information Processing Systems, 35:14434–14447, 2022.
- On faster convergence of cyclic block coordinate descent-type methods for strongly convex minimization. Journal of Machine Learning Research, 18(184):1–24, 2018.
- E. H. Lieb and M. Loss. Analysis, volume 14. American Mathematical Soc., 2001.
- A forward-reverse Brascamp-Lieb inequality: Entropic duality and Gaussian optimality. Entropy, 20(6):418, 2018.
- R. J. McCann. A Convexity Principle for Interacting Gases. Advances in Mathematics, 128(1):153–179, 1997.
- Mean field for the stochastic blockmodel: Optimization landscape and convergence issues. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
- F. Otto. The geometry of dissipative evolution equations: the porous medium equation. Communications in Partial Differential Equations, 26(1-2):101–174, 2001.
- Dynamics of coordinate ascent variational inference: A case study in 2d ising models. Entropy, 22(11), 2020.
- A. Saha and A. Tewari. On the nonasymptotic convergence of cyclic coordinate descent methods. SIAM Journal on Optimization, 23(1):576–601, 2013.
- A. Saumard and J. A. Wellner. Log-concavity and strong log-concavity: a review. Statistics surveys, 8:45, 2014.
- D. M. Titterington and B. Wang. Convergence properties of a general algorithm for calculating variational Bayesian estimates for a normal mixture model. Bayesian Analysis, 1(3):625 – 650, 2006.
- P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 109(3):475–494, June 2001.
- C. Villani. Topics in Optimal Transportation. Graduate Studies in Mathematics. American Mathematical Society, 2021.
- Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.
- A. Wibisono. Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem. In Conference on Learning Theory, pages 2093–3027. PMLR, 2018.
- A. Y. Zhang and H. H. Zhou. Theoretical and computational guarantees of mean field variational inference for community detection. The Annals of Statistics, 48(5):2575 – 2598, 2020.
- R. Yao and Y. Yang. Mean field variational inference via Wasserstein gradient flow. arXiv preprint arXiv:2207.08074, 2022.