Improved Contrastive Divergence Training of Energy Based Models (2012.01316v4)

Published 2 Dec 2020 in cs.LG

Abstract: Contrastive divergence is a popular method of training energy-based models, but is known to have difficulties with training stability. We propose an adaptation to improve contrastive divergence training by scrutinizing a gradient term that is difficult to calculate and is often left out for convenience. We show that this gradient term is numerically significant and in practice is important to avoid training instabilities, while being tractable to estimate. We further highlight how data augmentation and multi-scale processing can be used to improve model robustness and generation quality. Finally, we empirically evaluate stability of model architectures and show improved performance on a host of benchmarks and use cases,such as image generation, OOD detection, and compositional generation.

Citations (126)

View on Semantic Scholar

Summary

The paper presents a method to accurately estimate an additional gradient term in contrastive divergence, significantly improving training stability for EBMs.
It utilizes auto-differentiation and nearest-neighbor entropy estimators to tackle disparities in MCMC sampling efficiently.
The approach enhances image generation quality and out-of-distribution detection by integrating multi-scale processing and data augmentation strategies.

Improved Contrastive Divergence Training of Energy-Based Models

The paper "Improved Contrastive Divergence Training of Energy-Based Models" addresses several persistent challenges in training Energy-Based Models (EBMs) through contrastive divergence. The authors highlight notable advancements derived from incorporating additional gradient terms into the contrastive divergence objective, which have historically been overlooked or dismissed as negligible. This work introduces a method to estimate these terms more accurately, bringing attention to their significant role in stabilizing training processes and enhancing model robustness and performance across various tasks such as image generation and out-of-distribution (OOD) detection.

The primary focus is on a specific gradient term introduced due to changes in the energy function affecting the MCMC samples, which is a critical determinant of training stability but has been largely ignored in traditional contrastive divergence formulations. The authors demonstrate that this term can be effectively estimated through a combination of auto-differentiation and nearest-neighbor entropy estimators. Incorporating this additional term into the training objective leads to significant improvements in both stability and generation quality of EBMs, illustrating its importance where the absence of this term has resulted in limited applicability and scaling of EBMs in practice.

The authors further augment their approach with data augmentation strategies and multi-scale processing techniques to bolster the mixing capabilities of MCMC transitions and to leverage the inherent compositionality of EBMs. These strategies improve the diversity and quality of generated samples, thereby extending the utility of EBMs in sophisticated applications such as compositional generation, where models are required to generate complex patterns by composition of simpler elements. By processing data through multiple image resolutions, the models enhance spatial coherence without altering the underlying MCMC sampling process. Such innovative integration of contemporary deep learning advancements, including self-attention, represents a modernization of the classic EBM framework, potentially uplifting their performance to rival contemporary generative models, like GANs and autoregressive models, on complex image datasets.

Experimental results underscore the effectiveness of the proposed improvements. When tested across datasets—CIFAR-10, CelebA-HQ, LSUN, and ImageNet 32x32—the modified approach demonstrates substantial gains. It achieves FID scores close to some of the state-of-the-art generative adversarial networks and significantly exceeds the performance of previous EBM-based methods. Furthermore, in compositional generation tasks and OOD detection enquiries, the revised training approach shows promising outcomes, suggesting that EBMs under this framework maintain their probabilistic strengths and compositionality while overcoming historic challenges tied to training instabilities.

The improvements presented in the paper point towards the robust adaptation and evolution of EBMs, suggesting that by revisiting and refining the foundational training processes with modern computational tools and techniques, these models can realize their theoretical potential across a gamut of machine learning domains. Future research directions may see further refinement of these methods, computational efficiency optimization, and exploration in additional domains such as text and video processing, where the compositional power of EBMs could impactfully model complex dependencies and hierarchical patterns.

PDF Markdown

Related Papers

YouTube

Show All Videos