Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm (2006.09270v2)

Published 16 Jun 2020 in stat.ML, cs.LG, and math.OC

Abstract: We consider the task of sampling with respect to a log concave probability distribution. The potential of the target distribution is assumed to be composite, \textit{i.e.}, written as the sum of a smooth convex term, and a nonsmooth convex term possibly taking infinite values. The target distribution can be seen as a minimizer of the Kullback-Leibler divergence defined on the Wasserstein space (\textit{i.e.}, the space of probability measures). In the first part of this paper, we establish a strong duality result for this minimization problem. In the second part of this paper, we use the duality gap arising from the first part to study the complexity of the Proximal Stochastic Gradient Langevin Algorithm (PSGLA), which can be seen as a generalization of the Projected Langevin Algorithm. Our approach relies on viewing PSGLA as a primal dual algorithm and covers many cases where the target distribution is not fully supported. In particular, we show that if the potential is strongly convex, the complexity of PSGLA is $O(1/\varepsilon^2)$ in terms of the 2-Wasserstein distance. In contrast, the complexity of the Projected Langevin Algorithm is $O(1/\varepsilon^{12})$ in terms of total variation when the potential is convex.

Citations (36)

View on Semantic Scholar

Summary

The paper presents a duality-based formulation minimizing the KL divergence in Wasserstein space using composite convex potentials.
It establishes improved complexity bounds of O(1/ε²) for PSGLA under strong convexity compared to classical Projected Langevin methods.
The analysis broadens PSGLA's applicability to Bayesian inference and high-dimensional sampling tasks with challenging non-smooth conditions.

Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm

The paper "Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm" presents a detailed exploration into the complexities of sampling from a log-concave probability distribution whose potential is a composite function consisting of both smooth and non-smooth convex components. The authors seek to establish a primal-dual perspective on the Proximal Stochastic Gradient Langevin Algorithm (PSGLA), extending traditional analyses to cases where the target distribution is not fully supported.

Problem Formulation and Duality

The paper begins by exploring the problem of minimizing the Kullback-Leibler (KL) divergence within the Wasserstein space of probability measures. This divergence is minimized with respect to a target probability distribution that emerges from a composite potential split into a smooth term and a non-smooth term, potentially taking infinite values. The crux of this formulation involves establishing a strong duality result, which serves as a platform for analyzing the complexity of the algorithm at hand.

The duality gap, arising from this primal-dual formulation, is central to understanding the algorithm's behavior, specifically the iterations of PSGLA as a generalized form of the Projected Langevin Algorithm. When the potential is strongly convex, the authors demonstrate that the complexity of PSGLA is significantly reduced, maintaining a dependence of $O(1/\varepsilon^2)$ in terms of the 2-Wasserstein distance. This represents a substantial improvement over the classical Projected Langevin Algorithm, which bears a $O(1/\varepsilon^{12})$ dependency in terms of total variation for convex potentials.

Implications

From a theoretical perspective, the results indicate that PSGLA can be viewed through the lens of primal-dual algorithms, offering a robust framework capable of handling challenging conditions such as potentials that are not entirely smooth or fully supported. This perspective broadens the applicability of Langevin dynamics to a wider class of optimization problems in Bayesian statistics and machine learning where such conditions are common.

Practically, these developments suggest that practitioners can employ PSGLA in scenarios where standard Langevin algorithms may falter due to the intricate nature of the composite potentials involved. For instance, in statistical physics or high-dimensional Bayesian inference tasks, this approach could offer enhanced efficiency and stability.

Future Directions

The findings in this paper suggest several avenues for future research. Expanding upon the duality theory within the Wasserstein space to encompass a broader set of functionals could deepen our understanding and capability in probability measure optimization. Furthermore, extending these insights to other sampling algorithms might yield new methods with enhanced performance characteristics across various applications in statistics, data science, and numerical simulations.

In summary, this paper makes significant strides in reconciling primal-dual optimization concepts with stochastic gradient Langevin dynamics, presenting both theoretical advancements and practical computational benefits. As machine learning models continue to grow in complexity and scale, algorithms such as PSGLA that harness the power of duality offer promising pathways to more effective and efficient solutions.

PDF Markdown

Related Papers

Tweets

https://twitter.com/konstmish/status/1841840963074789759