Transductive Unbiased Embedding for Zero-Shot Learning (1803.11320v1)

Published 30 Mar 2018 in cs.CV

Abstract: Most existing Zero-Shot Learning (ZSL) methods have the strong bias problem, in which instances of unseen (target) classes tend to be categorized as one of the seen (source) classes. So they yield poor performance after being deployed in the generalized ZSL settings. In this paper, we propose a straightforward yet effective method named Quasi-Fully Supervised Learning (QFSL) to alleviate the bias problem. Our method follows the way of transductive learning, which assumes that both the labeled source images and unlabeled target images are available for training. In the semantic embedding space, the labeled source images are mapped to several fixed points specified by the source categories, and the unlabeled target images are forced to be mapped to other points specified by the target categories. Experiments conducted on AwA2, CUB and SUN datasets demonstrate that our method outperforms existing state-of-the-art approaches by a huge margin of 9.3~24.5% following generalized ZSL settings, and by a large margin of 0.2~16.2% following conventional ZSL settings.

Citations (193)

View on Semantic Scholar

Summary

Review of "Transductive Unbiased Embedding for Zero-Shot Learning"

The paper "Transductive Unbiased Embedding for Zero-Shot Learning" addresses a significant challenge in the field of Zero-Shot Learning (ZSL): the bias towards classifying instances of unseen (target) categories as one of the seen (source) categories in generalized ZSL settings. This issue often leads to substantial performance degradation, particularly when the model is applied in real-world scenarios where both seen and unseen classes must be considered. The authors propose a method called Quasi-Fully Supervised Learning (QFSL) to mitigate this bias by utilizing a transductive approach.

Methodology

The proposed QFSL method distinguishes itself by leveraging both labeled instances from source classes and unlabeled instances from target classes during the training phase. This is achieved within a semantic embedding space where the source images are mapped to anchor points representative of source categories, while unlabeled target images are mapped to different points indicative of the target categories. This dual use of data sources aims to provide an unbiased representation of classes in the semantic space. Furthermore, the authors implement their approach through a deep neural network architecture, permitting the use of a conventional multi-layer network model which naturally supports the integration of both source and target class instances.

Key Contributions

Transductive Approach: By incorporating unlabeled target class data during training, the model significantly reduces the bias towards source classes, a common drawback in previous ZSL approaches.
Improved Accuracy: The QFSL method demonstrates significant improvements in accuracy over state-of-the-art methods on established benchmarks.
Flexibility in Semantic Spaces: The method utilizes semantic spaces defined by attributes, but the underlying principles are adaptable to other representations such as word vectors.
Experimental Validation: The method was tested across multiple datasets, namely AwA2, CUB, and SUN, showing improvements in both conventional and generalized settings.

Results

The empirical results on the AwA2, CUB, and SUN datasets reflect the effectiveness of the QFSL method. The paper reports performance improvements ranging from $9.3\%$ to $24.5\%$ in generalized ZSL settings and between $0.2\%$ and $16.2\%$ in conventional ZSL settings over existing methods. These results are notable as they indicate robust performance across diverse datasets with varying characteristics in terms of granularity and class distribution.

Implications and Future Directions

The implications of this work are twofold. Practically, the method enables more accurate classifications in ZSL tasks, particularly in generalized settings where the test data could belong to both seen and unseen classes. Theoretically, the approach broadens the methodology in ZSL research by emphasizing the practical use of unlabeled target data during the training phase. Future research could explore the adaptation of this approach to inductive settings, investigate alternative semantic representations beyond attributes, and extend the methodology to other domains beyond visual object recognition.

This paper advances both the theoretical understanding and practical implementation of ZSL, providing a solid foundation for future work aimed at overcoming the inherent limitations of existing methods regarding class bias and performance in real-world applications.