A Generative Model For Zero Shot Learning Using Conditional Variational Autoencoders (1709.00663v2)

Published 3 Sep 2017 in cs.CV

Abstract: Zero shot learning in Image Classification refers to the setting where images from some novel classes are absent in the training data but other information such as natural language descriptions or attribute vectors of the classes are available. This setting is important in the real world since one may not be able to obtain images of all the possible classes at training. While previous approaches have tried to model the relationship between the class attribute space and the image space via some kind of a transfer function in order to model the image space correspondingly to an unseen class, we take a different approach and try to generate the samples from the given attributes, using a conditional variational autoencoder, and use the generated samples for classification of the unseen classes. By extensive testing on four benchmark datasets, we show that our model outperforms the state of the art, particularly in the more realistic generalized setting, where the training classes can also appear at the test time along with the novel classes.

Citations (290)

View on Semantic Scholar

Summary

The paper demonstrates a generative approach for zero-shot learning by using CVAEs to synthesize image features for unseen classes.
It leverages a non-linear neural architecture that generates pseudo-data from class attributes, effectively addressing domain shift issues.
Empirical evaluations on benchmarks like AwA, CUB, and ImageNet show enhanced performance, with accuracies reaching up to 71.4%.

A Generative Model for Zero Shot Learning Using Conditional Variational Autoencoders

Zero-Shot Learning (ZSL) constitutes a critical challenge in the domain of image classification, addressing scenarios where training data does not encompass all potential classes. This paper introduces an innovative approach that leverages Conditional Variational Autoencoders (CVAE) to tackle the Zero-Shot Learning problem. Unlike traditional methods that rely on mapping between class attribute spaces and the image space via transfer functions, this work proposes to generate sample data from class attributes using CAVEs, and subsequently employ these synthetic samples in classifying unseen classes.

Methodology

The core proposition of this research is viewing the ZSL problem through the lens of missing data. The Conditional Variational Autoencoder (CVAE) is employed to model the probability distribution of image features, conditioned on class embedding vectors. This is accomplished by generating data features conditioned on the semantic embedding of a class, using a non-linear architecture consisting of neural networks for both the encoder and the decoder. Crucially, this method deviates from linear compatibility models by attempting to recreate data features for new classes, something deemed more sophisticated for the complex nature of the image space.

Evaluation and Results

Empirical validation is carried out over four benchmark datasets: Animals with Attributes (AwA-1 and AwA-2), the CUB-200-2011 Bird dataset (CUB), and the SUN Attribute dataset. Moreover, to assert the model’s scalability, results are tested on the extensive ImageNet dataset. The generated pseudo data bolstered by CVAE allows training of classifiers (SVM in this paper) which outperform existing models, particularly within the generalized ZSL setting where both seen and unseen classes can appear at test time. The research indicates significant improvements, achieving 71.4% and 65.8% accuracy on AwA-1 and AwA-2 respectively, utilizing the CVAE model over prior methods.

Insights and Implications

The integration of CVAE tackles the inherent challenge of domain shift witnessed in ZSL, where learned mappings may falter across unseen classes. By focusing on modeling the image generations statistically, the CVAE better adapts to the generalized setting compared to previous methodologies which predominantly succeeded in settings constrained to disjoint class sets. The approach provides substantiation that generative models like CVAE have the necessary generalizability and potential for unseen class prediction tasks.

Future Directions

Progress in ZSL using generative models opens a plethora of future directions worth investigation. Among them includes enhancing the condition-specific image feature generation by addressing mode collapse, a limitation observed in this paper's visualization analysis. Another avenue entails refining class attribute embeddings perhaps through unsupervised means or utilizing richer corpora such as textual descriptions from major encyclopedic sources. Additionally, experimenting with other generative frameworks like GANs in this spectrum might present different performance dynamics.

In summary, the research profoundly contributes to the ongoing exploratory efforts in ZSL by illustrating the capability of CVAEs to synthesize training data for novel classes, offering a viable solution to a noteworthy problem in computer vision. The results evidenced in this paper bolster the prospect of applying probabilistic generative models in further emerging tasks within AI-driven classification paradigms.

PDF Markdown