Variational Inference for Uncertainty Quantification: an Analysis of Trade-offs (2403.13748v2)

Published 20 Mar 2024 in stat.ML, cs.LG, and stat.CO

Abstract: Given an intractable distribution $p$, the problem of variational inference (VI) is to find the best approximation from some more tractable family $Q$. Commonly, one chooses $Q$ to be a family of factorized distributions (i.e., the mean-field assumption), even though~$p$ itself does not factorize. We show that this mismatch leads to an impossibility theorem: if $p$ does not factorize, then any factorized approximation $q\in Q$ can correctly estimate at most one of the following three measures of uncertainty: (i) the marginal variances, (ii) the marginal precisions, or (iii) the generalized variance (which can be related to the entropy). In practice, the best variational approximation in $Q$ is found by minimizing some divergence $D(q,p)$ between distributions, and so we ask: how does the choice of divergence determine which measure of uncertainty, if any, is correctly estimated by VI? We consider the classic Kullback-Leibler divergences, the more general R\'enyi divergences, and a score-based divergence which compares $\nabla \log p$ and $\nabla \log q$. We provide a thorough theoretical analysis in the setting where $p$ is a Gaussian and $q$ is a (factorized) Gaussian. We show that all the considered divergences can be \textit{ordered} based on the estimates of uncertainty they yield as objective functions for~VI. Finally, we empirically evaluate the validity of this ordering when the target distribution $p$ is not Gaussian.

References (50)

Authors (3)

Charles C. Margossian (20 papers)
Loucas Pillaud-Vivien (19 papers)
Lawrence K. Saul (9 papers)

Citations (1)

View on Semantic Scholar

Summary

The paper presents an impossibility theorem showing that factorized Gaussian VI cannot simultaneously match a target’s variance, precision, and entropy.
It compares divergences such as reverse KL, forward KL, Rényi, and score-based, revealing how each emphasizes different aspects of the target distribution.
Numerical experiments validate the divergence ordering and demonstrate that specific Rényi parameters can uniquely match the target’s entropy.

An Analysis of Divergences for Variational Inference with Diagonal Gaussian Approximations

Introduction

Variational inference (VI) is a mainstay in the landscape of computational statistics and machine learning, offering a tractable means of approximating complex distributions. The core of VI lies in the choice of divergence to minimize between an approximating distribution and the target distribution. While the Kullback-Leibler (KL) divergence enjoys widespread use, alternative divergences have been proposed, yet their comparative performance remains under-explored. This paper investigates how the choice of divergence influences the efficacy of VI, specifically when approximating a Gaussian distribution using a Gaussian with a diagonal covariance matrix. We examine the KL divergence, R\'enyi divergences, and score-based divergences, shedding light on how each divergence's properties translate into the behavior of VI approximations.

Impossibility Theorem for FG-VI

At the heart of our analysis is an impossibility theorem delineating the inherent trade-offs in factorized Gaussian variational inference (FG-VI). The theorem reveals that no factorized approximation can simultaneously match the target distribution’s variance, precision, and entropy. This theorem frames the understanding that the selection of a divergence tailors the VI approximation towards capturing specific characteristics of the target distribution, inherently compromising on others.

Divergence Analysis and Ordering

The paper then explores the optimizations spawned by different divergences in a Gaussian setting. The reverse KL divergence is shown to match the target’s marginal precisions but underestimates its variance and entropy. Conversely, the forward KL divergence matches the target’s variance at the cost of precision and overestimates entropy. R\'enyi divergence, dependent on the parameter $\alpha$ , interpolates between these behaviors, offering a tuning knob between variance and precision matching, and uniquely has an entropy matching condition for a specific $\alpha$ .

The score-based divergences introduce a distinct perspective, where the optimization problem transforms into a quadratic program. Notably, these divergences can predict marginal variances that are zero or infinite, a phenomenon we term "variational collapse," marking a stark departure from the KL and R\'enyi frameworks.

Building on these insights, the paper delivers a comprehensive ordering of the divergences based on the estimated marginal variances from VI, extending naturally to ordering based on precision and entropy. This hierarchy guides the selection of divergences based on the inferential goals of VI.

Empirical Validation and Entropy Matching

The theoretical insights are juxtaposed with numerical experiments across several models, from Gaussian distributions to complex hierarchical and time series models. These experiments confirm the divergence ordering in specific cases, while also highlighting the prevalence of entropy-based ordering in approximating non-Gaussian targets. Remarkably, each R\'enyi divergence provides a unique $\alpha$ that matches the target’s entropy, although finding this $\alpha$ is non-trivial.

Conclusion

This paper makes strides in unpacking the effects of divergence choice on the outcomes of VI, particularly in the field of Gaussian approximations. The impossibility theorem and the systematic ordering of divergences offer a conceptual framework to anticipate the behavior of VI approximations, assisting practitioners in aligning their divergence choice with their inferential objectives. Future work may explore these trade-offs in richer variational families or under different model assumptions, paving the way for more nuanced applications of VI.

PDF Markdown

Related Papers

Tweets

https://twitter.com/charlesm993/status/1770807205278728490

https://twitter.com/StatMLPapers/status/1770662601397055693

https://twitter.com/StatCOupdates/status/1771011567230083227