Learning Energy-Based Models by Diffusion Recovery Likelihood (2012.08125v2)

Published 15 Dec 2020 in cs.LG and stat.ML

Abstract: While energy-based models (EBMs) exhibit a number of desirable properties, training and sampling on high-dimensional datasets remains challenging. Inspired by recent progress on diffusion probabilistic models, we present a diffusion recovery likelihood method to tractably learn and sample from a sequence of EBMs trained on increasingly noisy versions of a dataset. Each EBM is trained with recovery likelihood, which maximizes the conditional probability of the data at a certain noise level given their noisy versions at a higher noise level. Optimizing recovery likelihood is more tractable than marginal likelihood, as sampling from the conditional distributions is much easier than sampling from the marginal distributions. After training, synthesized images can be generated by the sampling process that initializes from Gaussian white noise distribution and progressively samples the conditional distributions at decreasingly lower noise levels. Our method generates high fidelity samples on various image datasets. On unconditional CIFAR-10 our method achieves FID 9.58 and inception score 8.30, superior to the majority of GANs. Moreover, we demonstrate that unlike previous work on EBMs, our long-run MCMC samples from the conditional distributions do not diverge and still represent realistic images, allowing us to accurately estimate the normalized density of data even for high-dimensional datasets. Our implementation is available at https://github.com/ruiqigao/recovery_likelihood.

Authors (5)

Ruiqi Gao (44 papers)
Yang Song (299 papers)
Ben Poole (46 papers)
Ying Nian Wu (138 papers)
Diederik P. Kingma (27 papers)

Citations (115)

View on Semantic Scholar

Summary

Overview of "Learning Energy-Based Models by Diffusion Recovery Likelihood"

Energy-Based Models (EBMs) represent a promising approach in probabilistic modeling, particularly in the space of unsupervised learning where they can serve as generative models without requiring labeled data. Despite their advantages, EBMs face significant challenges in scalability to high-dimensional datasets due to the demanding computational costs of training and sampling processes. In the paper "Learning Energy-Based Models by Diffusion Recovery Likelihood," the authors propose a novel method leveraging diffusion recovery likelihood to address these challenges, showing promising results in both training tractability and sampling fidelity.

Diffusion Recovery Likelihood Method

The paper introduces a diffusion recovery likelihood method as a novel approach to learning EBMs by training them on increasingly noisy versions of the dataset. This method draws inspiration from diffusion probabilistic models, particularly the works of Sohl-Dickstein et al. (2015) and Ho et al. (2020). The key idea is to train a series of EBMs to model the conditional probability of clean observations given their noisy counterparts across ascending noise levels.

The diffusion recovery likelihood simplifies training objectives by modeling conditional distributions which are easier to approximate than marginal distributions. A major component of this approach is the assumption that learning marginal distributions through recovery objectives is computationally more feasible due to their localized nature around the observations, reducing the complexity introduced by multi-modal high-dimensional spaces.

Implementation and Results

The authors effectively demonstrate the method's efficacy on image generation tasks using several benchmark datasets, including CIFAR-10, CelebA, and LSUN. The generated samples achieve high fidelity and competitive metrics such as FID (Frechet Inception Distance) and inception scores, often outperforming existing GAN-based and score-based methods despite utilizing substantially fewer computational resources during training.

Notably, on the CIFAR-10 dataset, the method achieves an FID of 9.58 and an inception score of 8.30, which are superior compared to the majority of GAN models. The paper also explores the use of very long Markov Chain Monte Carlo (MCMC) sampling chains, an area previously fraught with convergence issues. They demonstrate that through a thousand diffusion time steps, their long-run MCMC samplings remain stable and realistic, which is crucial for valid energy potential evaluation—a persistent critique against previous EBM training methods.

Practical and Theoretical Implications

This research opens pathways for more efficient unsupervised learning techniques via EBMs, offering scalable solutions for high-dimensional data with realistic sampling outcomes. By aligning the training objectives closely with diffusion models and denoising techniques, the paper suggests a potential shift toward utilizing more flexible schedules of noise levels and sampling steps, which could greatly benefit further applications in AI.

The diffusion recovery likelihood also promises improvements in estimating the true normalized density of datasets, potentially enhancing the theoretical understanding and applications of EBMs. The practical implications extend to image inpainting, interpolation, and other areas requiring high-quality generative models.

Future Directions

The work poses intriguing possibilities for future exploration, such as scaling the method to higher-resolution images and expanding to other data modalities beyond image datasets. Moreover, there remains potential for synthesizing image generation quality with stable long-run sampling to unify high-fidelity sample generation with validated energy models.

This paper is an integral contribution to the ongoing development within the field of unsupervised learning, offering solutions to key challenges while paving the way for more efficient and scalable EBM applications.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - ruiqigao/recovery_likelihood (48 stars)

Tweets

https://twitter.com/braneloop/status/1927083231708402156