Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 147 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 27 tok/s Pro

GPT-5 High 30 tok/s Pro

GPT-4o 96 tok/s Pro

Kimi K2 188 tok/s Pro

GPT OSS 120B 398 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Multi-level Cross-modal Feature Alignment via Contrastive Learning towards Zero-shot Classification of Remote Sensing Image Scenes (2306.06066v1)

Published 31 May 2023 in cs.CV and cs.LG

Abstract: Zero-shot classification of image scenes which can recognize the image scenes that are not seen in the training stage holds great promise of lowering the dependence on large numbers of labeled samples. To address the zero-shot image scene classification, the cross-modal feature alignment methods have been proposed in recent years. These methods mainly focus on matching the visual features of each image scene with their corresponding semantic descriptors in the latent space. Less attention has been paid to the contrastive relationships between different image scenes and different semantic descriptors. In light of the challenge of large intra-class difference and inter-class similarity among image scenes and the potential noisy samples, these methods are susceptible to the influence of the instances which are far from these of the same classes and close to these of other classes. In this work, we propose a multi-level cross-modal feature alignment method via contrastive learning for zero-shot classification of remote sensing image scenes. While promoting the single-instance level positive alignment between each image scene with their corresponding semantic descriptors, the proposed method takes the cross-instance contrastive relationships into consideration,and learns to keep the visual and semantic features of different classes in the latent space apart from each other. Extensive experiments have been done to evaluate the performance of the proposed method. The results show that our proposed method outperforms state of the art methods for zero-shot remote sensing image scene classification. All the code and data are available at github https://github.com/masuqiang/MCFA-Pytorch

Citations (1)

View on Semantic Scholar