A Deep and Tractable Density Estimator (1310.1757v2)

Published 7 Oct 2013 in stat.ML and cs.LG

Abstract: The Neural Autoregressive Distribution Estimator (NADE) and its real-valued version RNADE are competitive density models of multidimensional data across a variety of domains. These models use a fixed, arbitrary ordering of the data dimensions. One can easily condition on variables at the beginning of the ordering, and marginalize out variables at the end of the ordering, however other inference tasks require approximate inference. In this work we introduce an efficient procedure to simultaneously train a NADE model for each possible ordering of the variables, by sharing parameters across all these models. We can thus use the most convenient model for each inference task at hand, and ensembles of such models with different orderings are immediately available. Moreover, unlike the original NADE, our training procedure scales to deep models. Empirically, ensembles of Deep NADE models obtain state of the art density estimation performance.

Citations (180)

View on Semantic Scholar

Summary

The paper presents an efficient, order-agnostic training procedure for Neural Autoregressive Density Estimators (NADEs and RNADEs) that shares parameters across variable orderings, allowing for deep, tractable models.
Empirical results demonstrate that ensembles of order-agnostic NADEs achieve competitive or state-of-the-art performance on various datasets, including improved log-likelihoods and surpassing Restricted Boltzmann Machines on binarized-MNIST.
The method enhances inference and sampling for density estimation, offers a scalable approach for deep autoregressive models, and enables tractability that scales linearly with the number of layers.

A Deep and Tractable Density Estimator

The paper "A Deep and Tractable Density Estimator" introduces an innovative method for enhancing density estimation models by incorporating flexibility in their application to multidimensional datasets. The authors address the limitations of the Neural Autoregressive Distribution Estimator (NADE) and its real-valued variant RNADE, presenting a novel training procedure that significantly advances these models' performance and scalability.

The primary contribution of the work is the efficient procedure to train NADE models for every possible ordering of variables. By parameter sharing across these factorial models, the ensemble of such models is leveraged to provide state-of-the-art density estimation without the computational overhead of a naive approach that would require the simultaneous training of a factorial number of distinct models. The proposed method allows for deep architectures, creating deep versions of NADE that remain tractable and computationally efficient, scaling linearly with the number of layers.

The paper rigorously explores the implications of its approach, applying it to both binary and real-valued datasets. Empirically, NADEs trained using the order-agnostic procedure demonstrate competitive performance compared to models fixed with a single input ordering and surpass performance on multiple datasets using ensembles. Specifically, ensembles created utilizing different variable orderings provide enhanced statistical performance, indicating the utility of the parameter-sharing schema.

From an experimental perspective, NADE models trained using the outlined method deliver compelling results across diverse datasets, including binary UCI datasets, binarized-MNIST, and natural image patches. Highlights include:

Achieving improved log-likelihood on binary datasets through ensemble averaging and parameter sharing.
On binarized-MNIST, ensembles of NADEs surpass performance estimates for Restricted Boltzmann Machines (RBMs) and approach results aligned with Deep Belief Networks (DBNs).
On natural image patches, deep RNADE results exceed state-of-the-art performances, demonstrating the capacity for these models to generalize effectively across complex datasets.

The implications of this work are multifold, offering practical benefits such as enhanced inference capabilities for marginalization and sampling tasks, which can be achieved efficiently with the proposed NADE ordering strategies. Beyond practical application, the theoretical implications suggest robust pathways for scalable deep autoregressive models, presenting a shift in how density estimators are trained and utilized.

Future avenues may explore extending these models further into multidimensional structured data, leveraging deep learning innovations to refine performance further, and investigating the limits of parameter-sharing schema in varying real-world contexts. The paper invigorates discourse within AI research, particularly with enhancements to tractability, carving paths towards deeper engagement with practical density estimation solutions.

A Deep and Tractable Density Estimator (1310.1757v2)

Summary

A Deep and Tractable Density Estimator

Related Papers