Annealing Flow Generative Models Towards Sampling High-Dimensional and Multi-Modal Distributions (2409.20547v4)

Published 30 Sep 2024 in stat.ML, cs.LG, and stat.CO

Abstract: Sampling from high-dimensional, multi-modal distributions remains a fundamental challenge across domains such as statistical Bayesian inference and physics-based machine learning. In this paper, we propose Annealing Flow (AF), a method built on Continuous Normalizing Flow (CNF) for sampling from high-dimensional and multi-modal distributions. AF is trained with a dynamic Optimal Transport (OT) objective incorporating Wasserstein regularization, and guided by annealing procedures, facilitating effective exploration of modes in high-dimensional spaces. Compared to recent NF methods, AF greatly improves training efficiency and stability, with minimal reliance on MC assistance. We demonstrate the superior performance of AF compared to state-of-the-art methods through experiments on various challenging distributions and real-world datasets, particularly in high-dimensional and multi-modal settings. We also highlight AF potential for sampling the least favorable distributions.

Summary

The paper introduces Annealing Flow (AF), a generative model that integrates annealing with continuous normalizing flows to transition from an easy-to-sample to a target distribution.
The study demonstrates AF’s superior performance by achieving efficient mode exploration and lower MMD and Wasserstein distances compared to traditional MCMC, HMC, and related methods.
AF employs learned transport maps via neural ODEs and dynamic optimal transport objectives, offering significant practical advantages in high-dimensional Bayesian inference and physics-based machine learning.

The paper "Annealing Flow Generative Models Towards Sampling High-Dimensional and Multi-Modal Distributions" introduces a novel algorithm, Annealing Flow (AF), aimed at addressing the challenge of sampling from high-dimensional, multi-modal distributions. This problem is critical in various domains, such as statistical Bayesian inference and physics-based machine learning, where traditional methods like MCMC and its variants often struggle, particularly with slow mixing and inefficient mode exploration.

Methodological Innovations

The primary innovation in this work is the development of AF, a continuous normalizing flow-based approach that leverages annealing principles to transition samples from an easy-to-sample distribution to the target distribution. The key component of AF is the use of a learned transport map that facilitates the effective exploration of modes in high-dimensional spaces. Unlike traditional diffusion-based methods that require pre-learning from a dataset of unknown distribution, AF's training does not depend on preliminary samples from the target distribution. This significantly enhances its applicability and flexibility.

Theoretical Framework

AF constructs a sequence of intermediate distributions, $\tilde{f}_k(x)$ , which interpolate between the initial easy-to-sample distribution, $\pi_0(x)$ , and the unnormalized target distribution, $\tilde{q}(x)$ . This sequence is governed by an annealing schedule, $\beta_k$ , ensuring a gradual transition from $\pi_0(x)$ to $\tilde{q}(x)$ .

These intermediate distributions are formally defined as: $\tilde{f}_k(x) = \pi_0(x)^{1 - \beta_k} \tilde{q}(x)^{\beta_k},$ with $\beta_k$ increasing from 0 to 1, providing a smooth pathway from the initial distribution to the target.

The continuous normalizing flow is governed by a set of neural ODEs, where each transport map $\mathcal{T}_{k}$ minimizes a dynamic optimal transport objective. This objective includes a KL divergence term and a dynamic Wasserstein-2 regularization term to ensure smooth and efficient mode exploration: $\mathcal{T}_{k} = \arg\min_{\mathcal{T}} \left\{ \text{KL}(\mathcal{T}_\# \tilde{f}_{k-1} \| \tilde{f}_{k}) + \gamma \int_{t_{k-1}}^{t_{k}} \mathbb{E}_{x(t) \sim \rho_{k}(\cdot,t)} \|v_{k}(x(t),t)\|^2 \, dt \right\},$ subject to the continuity equation evolving the density over time.

Empirical Evaluation

The efficacy of AF is demonstrated through extensive experiments on various complex distributions, including Gaussian Mixture Models (GMM), truncated normals, funnel distributions, and high-dimensional, exponential-weighted Gaussian distributions. The results highlight AF's superior performance in balancing mode exploration and efficiently handling multi-modal distributions compared to state-of-the-art methods like MCMC, HMC, PT, SVGD, and NN-based approaches.

Strong Numerical Results

AF provides substantial improvements over existing methods, particularly in high-dimensional settings. For instance, in sampling tasks involving a 50-dimensional, exponential-weighted Gaussian distribution with 1024 modes, AF demonstrated efficient and balanced mode exploration, with sampling times significantly reduced compared to traditional MCMC methods. Additionally, AF consistently showed lower MMD and Wasserstein distances compared to other methods across various tasks.

Practical and Theoretical Implications

Practically, AF enhances the efficiency and applicability of generative models in high-dimensional, multi-modal contexts, offering a faster and more balanced approach to sampling. Theoretically, it bridges the gap between normalizing flow models and optimal transport by integrating the annealing philosophy into the transport map learning process.

Future Directions

Future developments could explore extending AF to more complex datasets and broader applicability in real-world machine learning tasks. Additionally, further research may investigate deeper theoretical connections and refinements in the objective function, potentially enhancing the robustness and efficiency of the annealing schedule.

In summary, "Annealing Flow Generative Models Towards Sampling High-Dimensional and Multi-Modal Distributions" presents a significant advancement in the field of generative models, offering a robust solution to the longstanding challenge of efficiently sampling from complex, high-dimensional distributions.

PDF Markdown

Follow-up Questions

Related Papers

Authors (2)

Tweets

https://twitter.com/StatMLPapers/status/1841221289207292115