FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding (2103.05950v2)

Published 10 Mar 2021 in cs.CV

Abstract: Emerging interests have been brought to recognize previously unseen objects given very few training examples, known as few-shot object detection (FSOD). Recent researches demonstrate that good feature embedding is the key to reach favorable few-shot learning performance. We observe object proposals with different Intersection-of-Union (IoU) scores are analogous to the intra-image augmentation used in contrastive approaches. And we exploit this analogy and incorporate supervised contrastive learning to achieve more robust objects representations in FSOD. We present Few-Shot object detection via Contrastive proposals Encoding (FSCE), a simple yet effective approach to learning contrastive-aware object proposal encodings that facilitate the classification of detected objects. We notice the degradation of average precision (AP) for rare objects mainly comes from misclassifying novel instances as confusable classes. And we ease the misclassification issues by promoting instance level intra-class compactness and inter-class variance via our contrastive proposal encoding loss (CPE loss). Our design outperforms current state-of-the-art works in any shot and all data splits, with up to +8.8% on standard benchmark PASCAL VOC and +2.7% on challenging COCO benchmark. Code is available at: https: //github.com/MegviiDetection/FSCE

Authors (5)

Bo Sun (100 papers)
Banghuai Li (7 papers)
Shengcai Cai (3 papers)
Ye Yuan (274 papers)
Chi Zhang (567 papers)

Citations (324)

View on Semantic Scholar

Summary

The paper introduces FSCE, a novel method that integrates contrastive learning to enhance few-shot object detection by encoding object proposals.
The approach extends Faster R-CNN with a contrastive branch, reducing intra-class variance and improving inter-class separation.
FSCE achieves notable performance gains, with up to an 8.8% improvement in novel average precision on PASCAL VOC, while maintaining robust base class detection.

Few-Shot Object Detection via Contrastive Proposal Encoding

In the paper "FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding," the authors present an innovative approach to addressing the challenges inherent in few-shot object detection (FSOD). The primary contribution of this work is the Few-Shot object detection via Contrastive proposals Encoding (FSCE), which leverages the concept of contrastive learning to improve object detection performance with limited data.

Summary

Few-shot learning traditionally aims to enable models to recognize new object classes with minimal training samples. A significant challenge in FSOD is the reliance on large datasets for training conventional deep learning models, which are prone to overfitting with limited data. The paper introduces FSCE to mitigate this issue, providing a mechanism to enhance the discriminative power of object proposals by incorporating contrastive learning methods.

The authors propose a novel method for encoding object proposals with contrastive-aware features, which decreases intra-class variance and improves inter-class differentiation. This is achieved through a contrastive proposal encoding loss (CPE Loss), designed to address misclassification issues by promoting tighter clustering of object proposal features. The approach is found to significantly outperform existing state-of-the-art methods across various data splits and benchmarks, such as PASCAL VOC and COCO, achieving improvements of up to 8.8% and 2.7%, respectively.

Methodology

The FSCE framework builds on the two-stage fine-tuning strategy commonly used in FSOD. It extends Faster R-CNN by adding a contrastive branch to the Region-of-Interest (RoI) feature head, enabling contrastive learning for object proposals. This is facilitated by a contrastive loss function designed specifically for FSOD, which operates on the cosine similarity of proposal encodings projected into a hypersphere.

A critical innovation is the proposal consistency control, which ensures that proposals used in contrastive learning maintain high Intersection-over-Union (IoU) with ground truth boxes, thus preserving semantic validity. This mechanism helps learn more robust object features, achieving substantial improvements in novel instance classification accuracy.

Results and Analysis

FSCE establishes new benchmarks in few-shot detection across multiple datasets. On the PASCAL VOC dataset, FSCE achieves state-of-the-art novel average precision (nAP50) for multiple evaluation settings, improving results on all three novel splits. The enhancements are particularly notable on PASCAL VOC split 3, where the method surpasses existing approaches by a margin of 8.8%.

In addition to performance improvements, FSCE retains a strong showing in base class detection, minimizing the effects of base forgetting—a common challenge in few-shot scenarios. The balance between novel and base class detection establishes FSCE as a robust method for generalized few-shot object detection tasks.

Implications and Future Work

The incorporation of contrastive learning into FSOD opens new avenues for research in visual representation learning. FSCE's ability to adapt effectively to novel instances with sparse annotations suggests its potential applicability to domains requiring rapid adaptation to new object categories, such as robotics and real-time surveillance systems.

Future work could explore extending FSCE to more general object detection architectures beyond Faster R-CNN. Additionally, further investigation into optimizing the balance between classification and localization, particularly in extreme low-shot scenarios, may yield further advancements in this domain. Exploring the synergy between FSCE and semi-supervised or unsupervised learning techniques could also provide insights into more comprehensive and data-efficient object detection solutions.

PDF Markdown

Related Papers

GitHub

GitHub - megvii-research/FSCE (280 stars)