Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Convergence of coordinate ascent variational inference for log-concave measures via optimal transport (2404.08792v1)

Published 12 Apr 2024 in stat.ML, cs.LG, math.OC, math.PR, math.ST, and stat.TH

Abstract: Mean field variational inference (VI) is the problem of finding the closest product (factorized) measure, in the sense of relative entropy, to a given high-dimensional probability measure $\rho$. The well known Coordinate Ascent Variational Inference (CAVI) algorithm aims to approximate this product measure by iteratively optimizing over one coordinate (factor) at a time, which can be done explicitly. Despite its popularity, the convergence of CAVI remains poorly understood. In this paper, we prove the convergence of CAVI for log-concave densities $\rho$. If additionally $\log \rho$ has Lipschitz gradient, we find a linear rate of convergence, and if also $\rho$ is strongly log-concave, we find an exponential rate. Our analysis starts from the observation that mean field VI, while notoriously non-convex in the usual sense, is in fact displacement convex in the sense of optimal transport when $\rho$ is log-concave. This allows us to adapt techniques from the optimization literature on coordinate descent algorithms in Euclidean space.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Gradient Flows. Lectures in Mathematics. ETH Zürich. Birkhauser Verlag AG, Basel, Switzerland, 2 edition, Dec. 2008.
  2. A. Beck. First-order methods in optimization. SIAM, 2017.
  3. D. P. Bertsekas. Nonlinear Programming. Athena Scientific, 1997.
  4. On the convergence of coordinate ascent variational inference, 2023.
  5. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
  6. A. Budhiraja and P. Dupuis. Analysis and approximation of rare events. Representations and Weak Convergence Methods. Series Prob. Theory and Stoch. Modelling, 94, 2019.
  7. S. Chewi. Log-concave sampling. Book draft available at https://chewisinho.github.io, 2024.
  8. M. Cule and R. Samworth. Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density. Electronic Journal of Statistics, 4(none):254 – 270, 2010.
  9. Analysis of Langevin Monte Carlo via convex optimization. The Journal of Machine Learning Research, 20(1):2666–2711, 2019.
  10. N. Gozlan and C. Léonard. Transport inequalities. a survey. arXiv preprint arXiv:1003.3852, 2010.
  11. Iteration complexity analysis of block coordinate descent methods. Mathematical Programming, 163(1–2):85–114, Aug. 2016.
  12. Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space. arXiv preprint 2312.02849, 2023.
  13. D. Lacker. Independent projections of diffusions: Gradient flows for variational inference and optimal mean field approximations, 2023.
  14. Mean Field Approximations via Log-Concavity. International Mathematics Research Notices, page rnad302, 12 2023.
  15. Variational inference via Wasserstein gradient flows. Advances in Neural Information Processing Systems, 35:14434–14447, 2022.
  16. On faster convergence of cyclic block coordinate descent-type methods for strongly convex minimization. Journal of Machine Learning Research, 18(184):1–24, 2018.
  17. E. H. Lieb and M. Loss. Analysis, volume 14. American Mathematical Soc., 2001.
  18. A forward-reverse Brascamp-Lieb inequality: Entropic duality and Gaussian optimality. Entropy, 20(6):418, 2018.
  19. R. J. McCann. A Convexity Principle for Interacting Gases. Advances in Mathematics, 128(1):153–179, 1997.
  20. Mean field for the stochastic blockmodel: Optimization landscape and convergence issues. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  21. F. Otto. The geometry of dissipative evolution equations: the porous medium equation. Communications in Partial Differential Equations, 26(1-2):101–174, 2001.
  22. Dynamics of coordinate ascent variational inference: A case study in 2d ising models. Entropy, 22(11), 2020.
  23. A. Saha and A. Tewari. On the nonasymptotic convergence of cyclic coordinate descent methods. SIAM Journal on Optimization, 23(1):576–601, 2013.
  24. A. Saumard and J. A. Wellner. Log-concavity and strong log-concavity: a review. Statistics surveys, 8:45, 2014.
  25. D. M. Titterington and B. Wang. Convergence properties of a general algorithm for calculating variational Bayesian estimates for a normal mixture model. Bayesian Analysis, 1(3):625 – 650, 2006.
  26. P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 109(3):475–494, June 2001.
  27. C. Villani. Topics in Optimal Transportation. Graduate Studies in Mathematics. American Mathematical Society, 2021.
  28. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.
  29. A. Wibisono. Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem. In Conference on Learning Theory, pages 2093–3027. PMLR, 2018.
  30. A. Y. Zhang and H. H. Zhou. Theoretical and computational guarantees of mean field variational inference for community detection. The Annals of Statistics, 48(5):2575 – 2598, 2020.
  31. R. Yao and Y. Yang. Mean field variational inference via Wasserstein gradient flow. arXiv preprint arXiv:2207.08074, 2022.
Citations (4)

Summary

  • The paper demonstrates that CAVI converges for log-concave distributions, with any weak limit point being a minimizer of the MFVI problem.
  • The paper applies optimal transport and Wasserstein geometry to establish linear and exponential convergence rates under Lipschitz and strong convexity conditions.
  • The paper provides explicit rate formulas that guide the number of iterations needed for accurate Bayesian inference in practical applications.

Convergence Analysis of CAVI for Log-Concave Distributions through Optimal Transport

Introduction

Coordinate Ascent Variational Inference (CAVI) is a popular method in Bayesian machine learning for approximating complex probability distributions, leveraging the factored form of distributions to iteratively solve an optimization problem that minimizes the Kullback-Leibler divergence between a target distribution and an approximating distribution. This paper, by Manuel Arnese and Daniel Lacker, presents a rigorous mathematical analysis of the convergence properties of the CAVI algorithm when applied to log-concave distributions. The key contribution is the demonstration of the algorithm's convergence under conditions of log-concavity, along with specific rates of convergence depending on additional regularity conditions on the target distribution.

Setting and Main Results

The authors consider a variational inference problem where the target distribution admits a density that is log-concave, i.e., can be expressed as an exponential of a convex function. The specific contributions of this paper are as follows:

  • General Convergence: For target measures with log-concave densities, the sequence generated by the CAVI algorithm is shown to be tight, with any weak limit point being a minimizer of the mean field variational inference (MFVI) problem.
  • Convex Case: With the additional assumption that the convex function defining the log-concavity of the target distribution is strictly convex, the MFVI problem is shown to have a unique minimizer. Moreover, the sequence generated by CAVI converges weakly to this unique minimizer.
  • Lipschitz Gradient: Under the assumption that the log-concave density's defining function has a Lipschitz gradient, a linear rate of convergence for the CAVI algorithm is established. The authors provide an explicit formula for calculating this rate, which depends on the Lipschitz constant and the dimensionality of the problem.
  • Strongly Convex Case: If, in addition, the convex function is strongly convex and the gradient is Lipschitz, an exponential rate of convergence is proven. The rate depends on the strong convexity parameter, the Lipschitz constant of the gradient, and the dimension.

Wasserstein Geometry of MFVI

A significant part of the analysis is devoted to understanding the problem from the perspective of optimal transport. By framing the MFVI problem within the Wasserstein space of probability measures, the authors exploit the geodesic convexity inherent in log-concave measures to apply techniques and results from the optimization literature on convex functions. This geometric perspective is crucial for establishing the main results of the paper.

Implications and Further Developments

This paper has theoretical implications for the understanding of variational inference algorithms, specifically highlighting the importance of the log-concavity assumption in ensuring convergence. From a practical standpoint, the explicit convergence rates provided can guide the application of CAVI in Bayesian statistics and machine learning, particularly in determining the number of iterations needed to achieve a certain accuracy. Looking forward, the methodological framework introduced here opens avenues for analyzing the convergence of variational inference algorithms in more general settings, potentially extending to non-log-concave measures and different classes of variational families.

Conclusion

This paper by Arnese and Lacker advances our understanding of the convergence behavior of the CAVI algorithm. Through a careful mathematical analysis rooted in optimal transport theory, it establishes conditions under which CAVI converges, and quantifies the rate of convergence in terms of properties of the target distribution. This work stands to be a significant reference in the ongoing development of efficient and reliable variational inference methods.