Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning (2207.01328v4)

Published 4 Jul 2022 in cs.CV and cs.AI

Abstract: Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training. One of the most effective and widely used semantic information for zero-shot image classification are attributes which are annotations for class-level visual characteristics. However, the current methods often fail to discriminate those subtle visual distinctions between images due to not only the shortage of fine-grained annotations, but also the attribute imbalance and co-occurrence. In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained LLMs (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives. We find that our DUET can achieve state-of-the-art performance on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark. Its components are effective and its predictions are interpretable.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zhuo Chen (319 papers)
  2. Yufeng Huang (14 papers)
  3. Jiaoyan Chen (85 papers)
  4. Yuxia Geng (22 papers)
  5. Wen Zhang (170 papers)
  6. Yin Fang (32 papers)
  7. Jeff Z. Pan (78 papers)
  8. Huajun Chen (198 papers)
Citations (47)

Summary

We haven't generated a summary for this paper yet.