Transferable Contrastive Network for Generalized Zero-Shot Learning

Published 16 Aug 2019 in cs.CV | (1908.05832v1)

Abstract: Zero-shot learning (ZSL) is a challenging problem that aims to recognize the target categories without seen data, where semantic information is leveraged to transfer knowledge from some source classes. Although ZSL has made great progress in recent years, most existing approaches are easy to overfit the sources classes in generalized zero-shot learning (GZSL) task, which indicates that they learn little knowledge about target classes. To tackle such problem, we propose a novel Transferable Contrastive Network (TCN) that explicitly transfers knowledge from the source classes to the target classes. It automatically contrasts one image with different classes to judge whether they are consistent or not. By exploiting the class similarities to make knowledge transfer from source images to similar target classes, our approach is more robust to recognize the target images. Experiments on five benchmark datasets show the superiority of our approach for GZSL.

Abstract PDF Upgrade to Chat

Authors (4)

Citations (167)

View on Semantic Scholar

Summary

Transferable Contrastive Network for Generalized Zero-Shot Learning

This paper presents a novel approach to generalized zero-shot learning (GZSL) through the introduction of the Transferable Contrastive Network (TCN). The TCN is designed to address the intrinsic challenge of zero-shot learning tasks, wherein models must recognize target categories for which no labeled data is available using semantic information to transfer knowledge from source classes. A common issue in existing models is their tendency to overfit source classes during the GZSL task, resulting in minimal learning concerning target classes. The TCN seeks to overcome this limitation by explicitly facilitating knowledge transfer between source and target classes through assessing semantic similarities.

Model Architecture and Methodology

The TCN incorporates two fundamental properties in its architecture: discriminative and transferable properties. While the discriminative property ensures that the model is adept at distinguishing between source classes, the transferable property encourages resilience and adaptability when recognizing target classes. The network contrasts an image's feature representation against the semantic features of different classes, thereby assessing consistency between the image and the class.

Key components of the TCN include:

Information Fusion: This involves encoding images and semantic information into a shared latent feature space using separate branches of neural network architectures (CNN and MLP for images and semantic data, respectively). An element-wise product operation then fuses these representations.
Contrastive Learning: A contrastive network evaluates fused features to derive contrastive values that indicate the likelihood of an image being consistent with specific class semantics. This stage employs a learning mechanism that focuses on maximizing consistency for matching pairs and minimizing it for mismatched pairs.

Additionally, the model employs a mechanism to leverage class similarities for enhancing knowledge transfer from source images to similar target classes, thus addressing the lack of labeled data for the latter.

Performance Evaluation and Results

The model demonstrates superior performance in ZSL tasks across multiple benchmark datasets, including APY, AWA1, AWA2, CUB, and SUN. It particularly excels in handling fine-grained recognition on datasets like CUB, significantly improving target class recognition in GZSL over existing baselines. Notably, the TCN architecture effectively mitigates the domain shift problem observed when transitioning from zero-shot to generalized zero-shot learning tasks.

Implications and Future Directions

The results presented in this paper underscore the potential of TCNs to significantly enhance the robustness of zero-shot learning frameworks. By efficiently transferring knowledge via semantic similarities, this approach helps alleviate overfitting to source classes and improve recognition of target classes.

Future research avenues may explore further refinement of semantic similarity measures and extending contrastive network models to incorporate dynamic or online learning components that can adapt to evolving class semantics or tackle more complex GZSL settings. Additionally, integrating TCN with generative models could offer synergies that yield even more effective GZSL performance, presenting exciting opportunities for advancing AI capabilities in zero-shot learning scenarios.