Semi-Supervised Learning with Deep Generative Models (1406.5298v2)

Published 20 Jun 2014 in cs.LG and stat.ML

Abstract: The ever-increasing size of modern data sets combined with the difficulty of obtaining label information has made semi-supervised learning one of the problems of significant practical importance in modern data analysis. We revisit the approach to semi-supervised learning with generative models and develop new models that allow for effective generalisation from small labelled data sets to large unlabelled ones. Generative approaches have thus far been either inflexible, inefficient or non-scalable. We show that deep generative models and approximate Bayesian inference exploiting recent advances in variational methods can be used to provide significant improvements, making generative approaches highly competitive for semi-supervised learning.

Citations (2,665)

View on Semantic Scholar

Summary

The paper introduces a novel framework that integrates deep generative models with variational inference for effective semi-supervised learning.
The paper demonstrates that combining latent-feature embeddings with generative processes significantly improves classification accuracy on benchmarks like MNIST.
The paper leverages scalable stochastic gradient variational Bayes, efficiently learning from limited labeled data and unlocking potential applications in diverse fields.

Semi-supervised Learning with Deep Generative Models

This paper, authored by Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, and Max Welling, explores the problem of semi-supervised learning (SSL) using deep generative models and advances in variational inference methods. The primary focus is on effectively generalizing from small labeled datasets to larger unlabeled ones, thereby addressing a significant issue in modern data analysis where labeled data is often scarce.

Introduction and Context

The SSL problem of classifying data when only a subset has corresponding class labels arises frequently in practice, especially in fields like image search, genomics, natural language parsing, and speech analysis. Traditional SSL methods include self-training schemes, transductive SVMs (TSVM), and graph-based methods, each with varying degrees of success and scalability issues. More recent approaches have leveraged neural networks and manifold learning. However, the authors identify a gap: the lack of a unified, scalable, and efficient generative approach to SSL.

Contributions

The paper makes several key contributions:

Framework for SSL with Generative Models: It introduces a novel framework combining probabilistic modeling with deep neural networks to form advanced parametric density estimators.
Variational Inference for SSL: The model applies variational inference to semi-supervised classification for the first time.
State-of-the-Art Performance: Empirical results show substantial improvements over existing methods on benchmark problems.
Qualitative Insights: The models qualitatively demonstrate their capacity to separate content from style in datasets, enabling analogical reasoning in image generation tasks.

Model Descriptions

Latent-feature Discriminative Model (M1)

This model provides an embedding or feature representation of the data through a deep generative approach, capturing higher-order moments via a nonlinear transformation. The embeddings help cluster related observations, enhancing classification accuracy even with limited labels. The latent variables are reduced in dimensionality compared to the original data, allowing for more straightforward separability. Features from this model are then used to train classifiers like TSVMs.

Generative Semi-supervised Model (M2)

This model integrates both class labels and continuous latent variables into a probabilistic framework. It treats classification as inference, marginalizing over unobserved class labels. The model effectively combines class-specific information with intra-class variabilities through a deep neural network likelihood function, providing a robust hybrid continuous-discrete mixture model.

Stacked Generative Model (M1+M2)

By stacking M1 and M2, the approach leverages M1's embedding capabilities followed by M2's generative process in the latent space transformed by M1. This results in a deep generative model with enhanced classification performance and lower-dimensional space embedding.

Scalable Variational Inference

The paper employs scalable variational inference techniques, exploiting inference networks for fast and efficient posterior approximation. The inference networks, parameterized as deep neural networks, are utilized for both labeled and unlabeled data across all models, facilitating joint optimization for both generative and variational parameters. This approach avoids the per-data point optimizations typical of traditional variational EM algorithms, significantly boosting computational efficiency through forms of stochastic gradient variational Bayes (SGVB).

Experimental Results

MNIST Classification Benchmark:

The models M1 and M2 exhibit strong performance, with M1+TSVM generalizing well, achieving notable accuracy improvements.
The combined M1+M2 model surpasses existing state-of-the-art methods, evidencing the effectiveness of the combined generative-discriminative approach in SSL.
For fully supervised learning, the combined model achieves a 0.96% error rate, competitive with leading methods in permutation-invariant MNIST tasks.

Conditional Generation:

The conditioned generative model enables exploration of data structure through analogical reasoning. Fixing one class label while varying latent variables, particularly in MNIST and SVHN datasets, demonstrates effective disentanglement of class-specific and style-specific features. This capability is valuable for generating new data samples that preserve underlying data distributions.

Practical and Theoretical Implications

The implications of the proposed models are substantial:

Practical Use: The framework can be applied to a wide range of data-rich fields where labeled data is scarce.
Scalability: The models' performance scales well with increasing data sizes, making them feasible for large datasets.
Future Extensions: The method's extension to convolutional neural networks could integrate its benefits with the current gold standard in image classification. Additionally, improvements in model selection through variational approaches could enhance applications in small data settings.

Conclusion

The paper presents a novel, effective approach to SSL using deep generative models, supported by scalable variational inference. The significant improvement in benchmark results highlights the potential for further advancements in SSL methods. The approach paves the way for new investigations into generative models' capabilities in handling complex semi-supervised classification tasks.

PDF Markdown