KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint Support

Published 16 Jun 2021 in stat.ML and cs.LG | (2106.08929v2)

Abstract: We study the gradient flow for a relaxed approximation to the Kullback-Leibler (KL) divergence between a moving source and a fixed target distribution. This approximation, termed the KALE (KL approximate lower-bound estimator), solves a regularized version of the Fenchel dual problem defining the KL over a restricted class of functions. When using a Reproducing Kernel Hilbert Space (RKHS) to define the function class, we show that the KALE continuously interpolates between the KL and the Maximum Mean Discrepancy (MMD). Like the MMD and other Integral Probability Metrics, the KALE remains well defined for mutually singular distributions. Nonetheless, the KALE inherits from the limiting KL a greater sensitivity to mismatch in the support of the distributions, compared with the MMD. These two properties make the KALE gradient flow particularly well suited when the target distribution is supported on a low-dimensional manifold. Under an assumption of sufficient smoothness of the trajectories, we show the global convergence of the KALE flow. We propose a particle implementation of the flow given initial samples from the source and the target distribution, which we use to empirically confirm the KALE's properties.

Abstract PDF Upgrade to Chat

Citations (32)

View on Semantic Scholar

Summary

The paper introduces KALE, a novel method that relaxes KL divergence for effectively handling distributions with non-overlapping supports.
It develops an RKHS-regularized optimization framework that interpolates between KL divergence and MMD for robust analysis of distribution convergence.
It demonstrates both theoretical convergence and practical applications via a particle descent algorithm, offering improvements for sampling in models like GANs.

A Professional Overview of "KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint Support"

In the field of probability distribution discrepancy measures, the Kullback-Leibler (KL) divergence stands as a foundational metric, broadly employed in statistical inference and Bayesian analysis. However, the applicability of the KL divergence is hampered by constraints on support overlap between distributions, limiting its utility in scenarios where the underlying distributions lack absolute continuity or are mutually singular. Glaser, Arbel, and Gretton introduce the KL Approximate Lower bound Estimator (KALE) as a relaxed divergence alternative that addresses these limitations.

Analysis and Numerical Assessment

This paper presents a nuanced exploration of the KALE framework, which extends the KL divergence by incorporating a regularization strategy based on Reproducing Kernel Hilbert Space (RKHS). By applying this technique, the authors show that KALE efficiently interpolates between the KL divergence and the Maximum Mean Discrepancy (MMD), two historically significant metrics in the assessment of distributional convergence and integration. KALE maintains superior sensitivity to variations in distributional support, making it particularly advantageous for examining target distributions within low-dimensional manifolds.

KALE's methodology is encapsulated within an optimization problem derived from the Fenchel dual formulation of the KL, regularized to exploit convexity in function space for computational tractability. This necessitates defining a function class within an RKHS framework. By doing so, KALE accommodates mutually singular distributions, mitigates issues associated with support mismatch, and provides a smoother surface over which Wasserstein gradient flows can be computed.

Methodological Contributions

The paper's most critical innovation lies in its formulation of the KALE gradient flow, directly addressing the complexities emerging from disjoint probabilistic supports. Through careful derivation, the authors establish the global convergence of the KALE flow under specified smoothness conditions, substantiated by both theoretical justification and empirical evidence. The proposed particle descent algorithm offers a practical implementation pathway, building on initial samples from source and target distributions.

Theoretical and Practical Implications

From a theoretical lens, the definition of KALE as a divergence extends beyond its role in empirical studies to propose an independent, innovative measure of distribution convergence. It embodies properties such as weak continuity and the metrizability of weak convergence, as demonstrated within the confines of kernelized function spaces. These properties affirm KALE's standing as a divergence that thrives independently of density configurations.

On a practical front, the computational strategies elucidated—particularly in the implementation of KALE particle descent—reveal potential enhancements over traditional gradient flow approaches. Such enhancements are particularly noteworthy in the context of generating adversarial samples in Generative Adversarial Networks (GANs) or when navigating the inherent complexities during the optimization of variational inference models.

Convergence and Future Prospects

An insightful discussion on the convergence dynamics of KALE is presented, drawing parallels with existing methods, including MMD flows and KL flow discretization algorithms such as the Unadjusted Langevin Algorithm (ULA). The paper posits that, due to the relaxing effect instigated by regularization, KALE serves as a viable sample-based approximation, offering new goose pastures for sampling methodologies reliant on geometric consistency.

In synthesis, KALE's introduction redefines the landscape of distributional convergence metrics, highlighting an adaptive interpolation between traditional divergence measures. Its theoretical foundation and demonstrated practical efficiency insinuate a future of augmented methodologies that permit thresholds of flexibility unattainable by more rigidly defined divergence constructs.

Markdown