Deep Generative Stochastic Networks Trainable by Backprop (1306.1091v5)

Published 5 Jun 2013 in cs.LG

Abstract: We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. The transition distribution of the Markov chain is conditional on the previous state, generally involving a small move, so this conditional distribution has fewer dominant modes, being unimodal in the limit of small moves. Thus, it is easier to learn because it is easier to approximate its partition function, more like learning to perform supervised function approximation, with gradients that can be obtained by backprop. We provide theorems that generalize recent work on the probabilistic interpretation of denoising autoencoders and obtain along the way an interesting justification for dependency networks and generalized pseudolikelihood, along with a definition of an appropriate joint distribution and sampling mechanism even when the conditionals are not consistent. GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest. We validate these theoretical results with experiments on two image datasets using an architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with simple backprop, without the need for layerwise pretraining.

Citations (392)

View on Semantic Scholar

Summary

The paper introduces a novel deep generative framework that trains probabilistic models using a learned Markov chain transition operator.
It generalizes denoising autoencoders by incorporating latent variables to consistently estimate the data distribution.
Experimental results on image datasets demonstrate efficient sample generation and robust handling of structured outputs and missing data.

Deep Generative Stochastic Networks Trainable by Backprop

In this paper, the authors introduce a novel approach to training deep probabilistic models through a framework they call Generative Stochastic Networks (GSNs). The GSN framework presents an alternative to traditional maximum likelihood estimation methods. Specifically, GSNs are designed to estimate the data-generating distribution by learning the transition operator of a Markov chain. This transition operator is parameterized in a way that facilitates the learning process by transforming complex unsupervised density estimation into a task more akin to supervised function approximation.

Key Contributions

The paper offers several key contributions to the field of unsupervised learning and deep generative models:

Alternative Training Principles: The authors propose an alternative training principle for probabilistic models. By modeling the transition operator as a conditional distribution with potentially fewer dominant modes, GSNs enable more tractable learning processes. This stands in contrast to traditional methods that struggle with the intractable sums often involved in marginalization or partition function estimation.
Generalization of Denoising Autoencoders: The GSNs extend the theoretical foundations of denoising autoencoders by incorporating latent variables, which enhance the expressive power of the model. The conditional distributions used in GSNs are proven to be consistent estimators of the target distribution when trained appropriately.
Theoretical Insights: The paper presents theorems supporting the consistency of GSNs as estimators of the data-generating distribution. It also introduces a method to relax certain ergodicity constraints, which broadens the applicability and robustness of the GSN framework.
Application to Structured Outputs and Missing Data: The GSN framework naturally extends to problems involving structured outputs or incomplete data. The paper illustrates how GSNs can sample missing inputs by treating them as stochastic variables in a Markov chain, thus facilitating conditional sampling.
Implementation and Experiments: An example application of the GSN framework is provided through experiments on image datasets. The GSN architecture employed mimics the Gibbs sampling used in Deep Boltzmann Machines, but only requires standard backpropagation for training, without the need for layerwise pretraining. The experimental results demonstrate the model's ability to generate high-quality samples and efficiently capture the data distribution.

Implications and Future Directions

The introduction of GSNs offers significant implications for both theoretical and practical aspects of machine learning. Practically, GSNs provide a mechanism for training deep generative models in a manner that is computationally feasible and less reliant on approximations involving complex sums, as often found in traditional probabilistic models. Theoretically, the insights from this work may spur further exploration into more robust training frameworks for deep generative models, potentially leading to advancements in the modeling of complex distributions such as those required for high-dimensional or structured data.

Future directions may include exploring more sophisticated architectures and loss functions within the GSN framework, as well as extending the methodology to other applications in AI, such as natural language processing or time-series analysis. Additionally, further investigation into optimizing the choice of corruption process and transition operators could yield deeper insights and more efficient models. The paper lays a robust groundwork for such explorations, building towards more efficient and powerful generative models.