Learning Disentangled Representations with Semi-Supervised Deep Generative Models (1706.00400v2)

Published 1 Jun 2017 in stat.ML, cs.AI, and cs.LG

Abstract: Variational autoencoders (VAEs) learn representations of data by jointly training a probabilistic encoder and decoder network. Typically these models encode all features of the data into a single variable. Here we are interested in learning disentangled representations that encode distinct aspects of the data into separate variables. We propose to learn such representations using model architectures that generalise from standard VAEs, employing a general graphical model structure in the encoder and decoder. This allows us to train partially-specified models that make relatively strong assumptions about a subset of interpretable variables and rely on the flexibility of neural networks to learn representations for the remaining variables. We further define a general objective for semi-supervised learning in this model class, which can be approximated using an importance sampling procedure. We evaluate our framework's ability to learn disentangled representations, both by qualitative exploration of its generative capacity, and quantitative evaluation of its discriminative ability on a variety of models and datasets.

Authors (8)

N. Siddharth (38 papers)
Brooks Paige (43 papers)
Jan-Willem van de Meent (57 papers)
Alban Desmaison (16 papers)
Noah D. Goodman (83 papers)
Pushmeet Kohli (116 papers)
Frank Wood (98 papers)
Philip H. S. Torr (219 papers)

Citations (354)

View on Semantic Scholar

Summary

Learning Disentangled Representations with Semi-Supervised Deep Generative Models

The paper "Learning Disentangled Representations with Semi-Supervised Deep Generative Models" presents a novel approach to learning disentangled data representations by extending the framework of Variational Autoencoders (VAEs). The core advancement lies in employing a generalizable model architecture that integrates elements of probabilistic graphical models into the encoding and decoding processes, specifically targeting the separation of latent variable representations for different data aspects. This paper posits that this approach can lead to more interpretable and semantically consistent representations. The framework supports the employment of partially-specified models, balancing between structured graphical model benefits and the flexibility of deep neural networks for unexplored variables.

Methodology

The architecture of the proposed model builds on standard VAEs but incorporates a graphical model structure in its probabilistic encoder and decoder, allowing for nuanced separations of latent variables. The encoder, acting as a probabilistic model, aims to approximate the posterior distribution by partitioning distinct features of data into separate latent variables. Of particular note is the semi-supervised learning aspect, where a fraction of the data possesses labeled variables. This paper defines a general objective for such semi-supervised learning in these generative models, which is approximated using an importance sampling technique. The design accommodates arbitrary dependency structures in the model, demanding minimal specification regarding the relationships among latent variables.

Experimental Results

Quantitative evaluations are conducted across tasks such as classification, regression, and generative synthesis on different datasets like MNIST, SVHN, and Yale B faces. For MNIST and SVHN, the model demonstrates comparable performance to state-of-the-art semi-supervised models, achieving classification error rates of down to 1.57% with MNIST. Metrics from lighting direction regression and identity classification in the Yale B dataset showcased performance improvement compared to prior methods. Notably, the model can interpret variations in the data as evident from its analogies generation ability, showcasing disentangled latent-space manipulations that yield comprehensible changes in reconstructed data.

Implications and Future Work

The implications of this research resonate both in theoretical and practical domains. Theoretically, the paper delineates a pathway for integrating structured graphical models within the neural architecture, facilitating interpretable AI systems. Practically, improved data representations could bolster applications in fields relying on data synthesis and interpretation, such as computer vision, healthcare, and autonomous systems.

Looking forward, potential advancements could involve exploring probabilistic programming frameworks to incorporate more complex and recursive model interactions. Utilizing the flexibility and expressiveness of probabilistic programming might lead to the automated generation of even more sophisticated models with less manual specification, bringing about a synergy between structured probabilistic methods and deep learning.

Overall, this paper contributes significantly to the ongoing exploration of AI models that learn robust, interpretable, and task-transferable data representations, advancing the understanding of how semi-supervised generative models can harness both the structure of probabilistic models and the adaptability of deep learning architectures.

PDF Markdown