Rethinking Zero-Shot Learning: A Conditional Visual Classification Perspective (1909.05995v2)

Published 13 Sep 2019 in cs.CV

Abstract: Zero-shot learning (ZSL) aims to recognize instances of unseen classes solely based on the semantic descriptions of the classes. Existing algorithms usually formulate it as a semantic-visual correspondence problem, by learning mappings from one feature space to the other. Despite being reasonable, previous approaches essentially discard the highly precious discriminative power of visual features in an implicit way, and thus produce undesirable results. We instead reformulate ZSL as a conditioned visual classification problem, i.e., classifying visual features based on the classifiers learned from the semantic descriptions. With this reformulation, we develop algorithms targeting various ZSL settings: For the conventional setting, we propose to train a deep neural network that directly generates visual feature classifiers from the semantic attributes with an episode-based training scheme; For the generalized setting, we concatenate the learned highly discriminative classifiers for seen classes and the generated classifiers for unseen classes to classify visual features of all classes; For the transductive setting, we exploit unlabeled data to effectively calibrate the classifier generator using a novel learning-without-forgetting self-training mechanism and guide the process by a robust generalized cross-entropy loss. Extensive experiments show that our proposed algorithms significantly outperform state-of-the-art methods by large margins on most benchmark datasets in all the ZSL settings. Our code is available at \url{https://github.com/kailigo/cvcZSL}

Authors (3)

Kai Li (313 papers)
Martin Renqiang Min (44 papers)
Yun Fu (131 papers)

Citations (120)

View on Semantic Scholar

Summary

Rethinking Zero-Shot Learning: A Conditional Visual Classification Perspective

The paper "Rethinking Zero-Shot Learning: A Conditional Visual Classification Perspective" addresses the inherent challenges faced by conventional zero-shot learning (ZSL) approaches which typically focus on semantic-visual correspondence via feature space mappings. The authors argue that these methods inadvertently discard the discriminative power inherent in visual features, which compromises performance. Instead, they propose a reformulated approach treating ZSL as a conditional visual classification problem. This innovative interpretation suggests that visual feature classification should be conditioned directly on class-specific semantic descriptions.

Methodology and Algorithmic Innovations

The authors' methodology revolves around utilizing a deep neural network to generate classifiers from semantic attributes directly. This conditional approach retains visual discriminability and effectively exploits inter-class competition information. The neural network trained under episode-based training is designed to simulate novel ZSL tasks, enhancing adaptability for genuinely unseen classes during testing.

Conventional ZSL: A deep neural network converts semantic attributes into visual feature classifiers using a cosine similarity-based cross-entropy loss. This approach tackles variances in features across different domains.
Generalized ZSL: The learned classifiers for seen classes are combined with generated classifiers for unseen classes. This allows the network to recognize visual features across an expanded class set without notable decreases in recognition accuracy, benefiting from discriminative classifiers for seen categories.
Transductive ZSL: Unlabeled data are leveraged to calibrate the classifier generator. A learning-without-forgetting self-training mechanism using generalized cross-entropy loss mitigates the degradation caused by incorrect pseudo labels and helps maintain recognition efficacy in seen classes.

Empirical Evaluation

Extensive experiments conducted across several benchmark datasets demonstrate that the reformulated approach significantly outperforms existing ZSL algorithms, especially in generalized settings. Notably, it achieves remarkable accuracy for unseen class recognition without substantial performance drops when more classes are involved.

Practical and Theoretical Implications

Practically, this reformulation of ZSL challenges existing methodologies, providing a framework capable of better leveraging the rich variability within visual feature spaces. The approach opens pathways to optimize classifier generation processes in domains where acquiring labeled data is highly resource-intensive. Theoretically, it contributes a nuanced perspective on embedding design, emphasizing the importance of maintaining intra-domain discriminability while facilitating cross-domain semantic interpretations.

Future Directions

The research effectively integrates episode-based learning into the ZSL domain and suggests potential extensions in transductive settings by exploring unlabeled data utilization further. Future work could enhance the robustness of class separation by incorporating more sophisticated competitive strategies and address scalability concerns associated with expanding the semantic space definition in complex real-world scenarios.

In conclusion, the paper compellingly advocates for reconsidering ZSL's foundational approach, proposing a conditional viewpoint that harnesses visual discriminability while integrating semantic attributes for classifier generation. This perspective not only improves ZSL performance but also paves the way for new methodologies and applications in artificial intelligence.

PDF Markdown

Related Papers

GitHub

GitHub - kailigo/cvcZSL: PyTorch Implementation for ICCV19 paper "Rethinking Zero-Shot Learning: A Conditional Visual Classification Perspective" (33 stars)