Generalized Denoising Auto-Encoders as Generative Models (1305.6663v4)

Published 29 May 2013 in cs.LG

Abstract: Recent work has shown how denoising and contractive autoencoders implicitly capture the structure of the data-generating density, in the case where the corruption noise is Gaussian, the reconstruction error is the squared error, and the data is continuous-valued. This has led to various proposals for sampling from this implicitly learned density function, using Langevin and Metropolis-Hastings MCMC. However, it remained unclear how to connect the training procedure of regularized auto-encoders to the implicit estimation of the underlying data-generating distribution when the data are discrete, or using other forms of corruption process and reconstruction errors. Another issue is the mathematical justification which is only valid in the limit of small corruption noise. We propose here a different attack on the problem, which deals with all these issues: arbitrary (but noisy enough) corruption, arbitrary reconstruction loss (seen as a log-likelihood), handling both discrete and continuous-valued variables, and removing the bias due to non-infinitesimal corruption noise (or non-infinitesimal contractive penalty).

Citations (526)

View on Semantic Scholar

Summary

The paper introduces a generalized denoising auto-encoder framework that extends traditional DAEs to estimate data-generating distributions using arbitrary corruption and loss functions.
It employs a Markov chain sampling method that enhances reconstruction consistency and reduces spurious modes, as demonstrated on benchmarks like MNIST.
Experimental results validate the approach’s ability to effectively handle both discrete and continuous data, advancing unsupervised generative modeling techniques.

Generalized Denoising Auto-Encoders as Generative Models

The paper "Generalized Denoising Auto-Encoders as Generative Models" by Bengio et al. builds on the foundation of auto-encoders, particularly their role in capturing data distribution in unsupervised learning contexts. The authors propose an extended framework for Denoising Auto-Encoders (DAEs) capable of addressing limitations in both discrete and continuous data. This approach aims to offer a broader probabilistic interpretation applicable to various types of corruption and reconstruction loss functions.

Key Contributions

The primary contribution is the generalized theoretical framework that leverages DAEs for estimating the data-generating distribution. The authors bridge several gaps in the existing literature by addressing significant challenges:

General Corruption and Loss Function: The method permits arbitrary corruption processes and reconstruction errors, unlike previous models reliant on Gaussian corruption and squared error loss. This adaptation facilitates the handling of mixed data types (continuous and discrete).
Non-infinitesimal Corruption: The approach mitigates biases induced by non-infinitesimal corruption noise or contractive penalty, allowing better practical application where large noise levels are sometimes more beneficial.
Markov Chain Sampling: The paper introduces a Markov chain method for consistent estimation, leveraging the DAE to estimate the reverse conditional distribution. This ensures that the reconstructed distribution effectively mirrors the training data's true distribution.

Experimental Results

The authors validate their theoretical assertions through rigorous experiments, both in non-parametric and parametric settings:

Non-Parametric Validation: A low-dimensional example demonstrates the DAE's ability to learn the data distribution effectively. In this case, maximum likelihood estimation of multinomial models was applied successfully to discrete datasets.
Parametric Evaluation with MNIST: DAEs were trained on the MNIST dataset, comparing standard and novel walkback procedures. The samples generated by the model using the walkback algorithm exhibited fewer spurious modes, and quantitative assessments via non-parametric density estimates confirmed improved log-likelihood results.

Implications and Future Directions

This work significantly expands the application of DAEs by setting a foundation where generative models can more rigorously approximate complex data distributions, particularly when noisy input conditions are prevalent. It illustrates potential improvements in both model robustness and sample quality due to the flexibility in corruption processes and estimations.

Future work could delve into enhancing the multi-modality of the reconstruction distribution, potentially employing more sophisticated architectures such as NADE. Furthermore, exploring deeper network architectures to leverage this generalized framework may offer richer hierarchical representations and improve the scalability of the sampling process, akin to those seen in deep belief networks.

The methodological innovations and empirical validations presented provide a critical stride towards maximizing the potential of auto-encoders as generative models within the machine learning community.

PDF Markdown

Related Papers

Tweets

https://twitter.com/eric_nalisnick/status/1834984176174485921

YouTube

Show All Videos