No MCMC for me: Amortized sampling for fast and stable training of energy-based models (2010.04230v3)

Published 8 Oct 2020 in cs.LG and cs.AI

Abstract: Energy-Based Models (EBMs) present a flexible and appealing way to represent uncertainty. Despite recent advances, training EBMs on high-dimensional data remains a challenging problem as the state-of-the-art approaches are costly, unstable, and require considerable tuning and domain expertise to apply successfully. In this work, we present a simple method for training EBMs at scale which uses an entropy-regularized generator to amortize the MCMC sampling typically used in EBM training. We improve upon prior MCMC-based entropy regularization methods with a fast variational approximation. We demonstrate the effectiveness of our approach by using it to train tractable likelihood models. Next, we apply our estimator to the recently proposed Joint Energy Model (JEM), where we match the original performance with faster and stable training. This allows us to extend JEM models to semi-supervised classification on tabular data from a variety of continuous domains.

Citations (68)

View on Semantic Scholar

Summary

The paper presents VERA, a novel variational approach that uses amortized sampling and entropy regularization to eliminate the need for MCMC in EBM training.
It demonstrates improved training stability and higher likelihood scores on benchmark datasets by streamlining the gradient estimation process.
The approach extends to joint energy models, enhancing semi-supervised classification and generative performance while offering promising avenues for future research.

An Approach to Amortized Sampling for Energy-Based Models

Energy-Based Models (EBMs) have been established as a potent framework in machine learning, offering flexibility in representing uncertainty across various domains, such as image generation, out-of-distribution detection, and semi-supervised learning. However, training EBMs for high-dimensional datasets poses significant challenges, primarily due to the computational costs and instability associated with the methods like Markov Chain Monte Carlo (MCMC). This paper introduces a novel approach, namely Variational Entropy Regularized Approximate maximum likelihood (VERA), to address these issues by utilizing amortized sampling techniques combined with an entropy-regularized generator. This method is designed to replace MCMC sampling in the training process, aiming for faster and stabler training of EBMs, particularly in scalable applications.

Overview of the Technique

The central contribution of the paper is an alternative to MCMC-based EBM training through the application of a variational approximation. The authors redefine maximum likelihood training of EBMs as a bi-level variational optimization problem, where the entropy of the generator is leveraged to approximate the likelihood efficiently. The generator is trained to model the data distribution using entropy regularization, achieved through a variational approximation that circumvents the computational expense typical of MCMC methods. The entropy regularizer enhances the diversity of the generated samples, thereby improving the likelihood estimation of the EBM.

Key Contributions

Entropy Regularization: By introducing a novel estimator for gradients of entropy using a variational approximation, the methodology streamlines the training process without resorting to sequential or costly computations.
Improved Likelihood: Experimentation with tractable models shows that this method enables the training of models that achieve higher likelihood scores compared to methods reliant on traditional MCMC-based approaches.
Application to Joint Energy Models (JEM): The paper extends its findings to JEMs, illustrating how VERA stabilizes training, improves performance in semi-supervised classification, and accelerates training compared to previous approaches using MCMC.
Generative and Discriminative Performance: Through empirical evaluations on datasets like CIFAR10, CIFAR100, and SVHN, VERA exhibits competitive classification accuracy and generative quality, indicated by metrics like FID.

Implications and Future Directions

The proposed method not only enhances computational efficiency and stability but also broadens the applicability of EBMs to domains where traditional methods may falter. This flexibility is particularly transformative for semi-supervised learning tasks across diverse data types, thereby presenting a viable option where domain-specific augmentations cannot be leveraged.

Future research could pivot towards fine-tuning the entropy regularizer and exploring alternative architectures for the generative model, potentially yielding better performance and even more robust applications. This could enhance the deployment of EBMs in real-world scenarios, extending beyond the domain of image data to other continuous datasets.

Ultimately, VERA represents a significant step forward in the practical application of energy-based models, providing a framework that could redefine the standard practices in machine learning for model training across various applications.

PDF Markdown

Related Papers

GitHub

GitHub - wgrathwohl/VERA (63 stars)