Reweighted Wake-Sleep (1406.2751v4)

Published 11 Jun 2014 in cs.LG

Abstract: Training deep directed graphical models with many hidden variables and performing inference remains a major challenge. Helmholtz machines and deep belief networks are such models, and the wake-sleep algorithm has been proposed to train them. The wake-sleep algorithm relies on training not just the directed generative model but also a conditional generative model (the inference network) that runs backward from visible to latent, estimating the posterior distribution of latent given visible. We propose a novel interpretation of the wake-sleep algorithm which suggests that better estimators of the gradient can be obtained by sampling latent variables multiple times from the inference network. This view is based on importance sampling as an estimator of the likelihood, with the approximate inference network as a proposal distribution. This interpretation is confirmed experimentally, showing that better likelihood can be achieved with this reweighted wake-sleep procedure. Based on this interpretation, we propose that a sigmoidal belief network is not sufficiently powerful for the layers of the inference network in order to recover a good estimator of the posterior distribution of latent variables. Our experiments show that using a more powerful layer model, such as NADE, yields substantially better generative models.

Citations (177)

View on Semantic Scholar

Summary

The paper introduces the RWS algorithm that uses multiple importance samples to yield less biased estimators of likelihood gradients.
It demonstrates significant performance improvements over the classic wake-sleep method on benchmarks like MNIST and CalTech Silhouettes.
Implementing advanced layer models such as NADE enhances inference performance, bridging traditional methods with state-of-the-art approaches.

Reweighted Wake-Sleep: Enhancements in Training Deep Directed Graphical Models

The paper, authored by Jörg Bornschein and Yoshua Bengio, explores the challenges surrounding the training of deep directed graphical models, specifically Helmholtz machines and deep belief networks (DBNs) that encompass numerous hidden variables. The wake-sleep algorithm, a previously established method for their training, is re-examined through a novel lens that utilizes importance sampling to achieve better estimators of the likelihood gradient.

Core Contributions

Novel Interpretation of Wake-Sleep Algorithm: The authors propose an alternative interpretation that suggests sampling latent variables multiple times from the inference network can provide improved estimations of the gradient. By viewing it through the importance sampling framework, this interpretation highlights that utilizing multiple samples allows for less biased estimators—approaching an unbiased estimator as the sample count increases.
Reweighted Wake-Sleep (RWS) Algorithm: The paper introduces a generalization of the wake-sleep algorithm, termed as Reweighted Wake-Sleep (RWS), which employs multiple samples to effectively mitigate the bias in likelihood gradient estimations. Empirical results display that with five samples (denoted as $K = 5$ ), RWS yields significant improvements over the classic wake-sleep, which corresponds to $K = 1$ .
Improvement through Layer Model Selection: The research reveals that the performance of the inference network can be considerably enhanced by employing more powerful layer models like Neural Autoregressive Distribution Estimator (NADE) over simple layers like Sigmoidal Belief Networks (SBNs). This shift results in better posterior distribution recovery of latent variables.

Experimental Validation

The efficacy of RWS is demonstrated through experiments on the MNIST dataset and other binary datasets. The paper reports on the dramatic improvement in log-likelihood estimates with the RWS algorithm in comparison to traditional wake-sleep methods, as well as comparable performance to more recent approaches like Variational Auto-Encoders and Deep Autoregressive Networks. Specifically, employing more complex models within the RWS framework advances the log-likelihood outcomes closer to state-of-the-art results.

MNIST Benchmarking: The models trained using RWS showed superior performance over their wake-sleep counterparts and closely compete with leading models by utilizing deeper network architectures and enhanced layer designs.
CalTech Silhouettes Dataset: Similar performance improvements are observed, with the RWS trained models outperforming RBMs, which were previously a competitive benchmark on this dataset.

Implications and Future Outlook

The proposed RWS algorithm, by effectively addressing the limitations of conventional methods, suggests a promising direction for the training of generative models. It bridges the gap between traditional approaches and contemporary ones by reducing estimation biases without sacrificing computational efficiency significantly.

Looking forward, there are clear implications on how model architectures for inference can be optimized. The insights underline the necessity for flexible and powerful layer models to leverage complex data structures inherent in AI tasks. Future research could expand upon the adaptability of the RWS algorithm to continuous latent variables and further explore the computational complexities tied to deeper network structures.

Conclusion

In conclusion, the paper offers substantial developments in the domain of deep generative models. By refining an established algorithmic approach through strategic reinterpretation and empirical rigor, Bornschein and Bengio contribute valuable insights into enhancing the tractability and effectiveness of deep probabilistic models. The RWS algorithm, with its methodological advancements, provides a robust framework expanding the capabilities and reliability of graphical model training.

Related Papers

GitHub

GitHub - jbornschein/reweighted-ws: Implementation of the reweighted wake-sleep machine learning algorithm (41 stars)