Convergence of coordinate ascent variational inference for log-concave measures via optimal transport (2404.08792v1)

Published 12 Apr 2024 in stat.ML, cs.LG, math.OC, math.PR, math.ST, and stat.TH

Abstract: Mean field variational inference (VI) is the problem of finding the closest product (factorized) measure, in the sense of relative entropy, to a given high-dimensional probability measure $\rho$. The well known Coordinate Ascent Variational Inference (CAVI) algorithm aims to approximate this product measure by iteratively optimizing over one coordinate (factor) at a time, which can be done explicitly. Despite its popularity, the convergence of CAVI remains poorly understood. In this paper, we prove the convergence of CAVI for log-concave densities $\rho$. If additionally $\log \rho$ has Lipschitz gradient, we find a linear rate of convergence, and if also $\rho$ is strongly log-concave, we find an exponential rate. Our analysis starts from the observation that mean field VI, while notoriously non-convex in the usual sense, is in fact displacement convex in the sense of optimal transport when $\rho$ is log-concave. This allows us to adapt techniques from the optimization literature on coordinate descent algorithms in Euclidean space.

References (31)

Citations (4)

View on Semantic Scholar

Summary

The paper demonstrates that CAVI converges for log-concave distributions, with any weak limit point being a minimizer of the MFVI problem.
The paper applies optimal transport and Wasserstein geometry to establish linear and exponential convergence rates under Lipschitz and strong convexity conditions.
The paper provides explicit rate formulas that guide the number of iterations needed for accurate Bayesian inference in practical applications.

Convergence Analysis of CAVI for Log-Concave Distributions through Optimal Transport

Introduction

Coordinate Ascent Variational Inference (CAVI) is a popular method in Bayesian machine learning for approximating complex probability distributions, leveraging the factored form of distributions to iteratively solve an optimization problem that minimizes the Kullback-Leibler divergence between a target distribution and an approximating distribution. This paper, by Manuel Arnese and Daniel Lacker, presents a rigorous mathematical analysis of the convergence properties of the CAVI algorithm when applied to log-concave distributions. The key contribution is the demonstration of the algorithm's convergence under conditions of log-concavity, along with specific rates of convergence depending on additional regularity conditions on the target distribution.

Setting and Main Results

The authors consider a variational inference problem where the target distribution admits a density that is log-concave, i.e., can be expressed as an exponential of a convex function. The specific contributions of this paper are as follows:

General Convergence: For target measures with log-concave densities, the sequence generated by the CAVI algorithm is shown to be tight, with any weak limit point being a minimizer of the mean field variational inference (MFVI) problem.
Convex Case: With the additional assumption that the convex function defining the log-concavity of the target distribution is strictly convex, the MFVI problem is shown to have a unique minimizer. Moreover, the sequence generated by CAVI converges weakly to this unique minimizer.
Lipschitz Gradient: Under the assumption that the log-concave density's defining function has a Lipschitz gradient, a linear rate of convergence for the CAVI algorithm is established. The authors provide an explicit formula for calculating this rate, which depends on the Lipschitz constant and the dimensionality of the problem.
Strongly Convex Case: If, in addition, the convex function is strongly convex and the gradient is Lipschitz, an exponential rate of convergence is proven. The rate depends on the strong convexity parameter, the Lipschitz constant of the gradient, and the dimension.

Wasserstein Geometry of MFVI

A significant part of the analysis is devoted to understanding the problem from the perspective of optimal transport. By framing the MFVI problem within the Wasserstein space of probability measures, the authors exploit the geodesic convexity inherent in log-concave measures to apply techniques and results from the optimization literature on convex functions. This geometric perspective is crucial for establishing the main results of the paper.

Implications and Further Developments

This paper has theoretical implications for the understanding of variational inference algorithms, specifically highlighting the importance of the log-concavity assumption in ensuring convergence. From a practical standpoint, the explicit convergence rates provided can guide the application of CAVI in Bayesian statistics and machine learning, particularly in determining the number of iterations needed to achieve a certain accuracy. Looking forward, the methodological framework introduced here opens avenues for analyzing the convergence of variational inference algorithms in more general settings, potentially extending to non-log-concave measures and different classes of variational families.

Conclusion

This paper by Arnese and Lacker advances our understanding of the convergence behavior of the CAVI algorithm. Through a careful mathematical analysis rooted in optimal transport theory, it establishes conditions under which CAVI converges, and quantifies the rate of convergence in terms of properties of the target distribution. This work stands to be a significant reference in the ongoing development of efficient and reliable variational inference methods.

PDF Markdown

Related Papers

Tweets

https://twitter.com/sp_monte_carlo/status/1780151270885441727

https://twitter.com/StatMLPapers/status/1780084790223376708