Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector (1908.01998v4)

Published 6 Aug 2019 in cs.CV

Abstract: Conventional methods for object detection typically require a substantial amount of training data and preparing such high-quality training data is very labor-intensive. In this paper, we propose a novel few-shot object detection network that aims at detecting objects of unseen categories with only a few annotated examples. Central to our method are our Attention-RPN, Multi-Relation Detector and Contrastive Training strategy, which exploit the similarity between the few shot support set and query set to detect novel objects while suppressing false detection in the background. To train our network, we contribute a new dataset that contains 1000 categories of various objects with high-quality annotations. To the best of our knowledge, this is one of the first datasets specifically designed for few-shot object detection. Once our few-shot network is trained, it can detect objects of unseen categories without further training or fine-tuning. Our method is general and has a wide range of potential applications. We produce a new state-of-the-art performance on different datasets in the few-shot setting. The dataset link is https://github.com/fanq15/Few-Shot-Object-Detection-Dataset.

Authors (4)

Qi Fan (30 papers)
Wei Zhuo (24 papers)
Chi-Keung Tang (81 papers)
Yu-Wing Tai (123 papers)

Citations (500)

View on Semantic Scholar

Summary

The paper introduces a novel architecture combining Attention-RPN and Multi-Relation Detector to enhance object proposals using support image cues.
It employs a contrastive training strategy with triplets to effectively distinguish similar and dissimilar categories, reducing false positives.
Results on FSOD and ImageNet benchmarks show state-of-the-art improvements in AP50 and AP75, demonstrating robust generalization.

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

The paper "Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector" addresses the significant challenge of few-shot object detection, aiming to detect objects in previously unseen categories using merely a handful of annotated examples. This research is noteworthy for its introduction of innovative components like Attention-RPN and a Multi-Relation Detector, alongside a Contrastive Training strategy, which together enable effective detection of novel objects while minimizing false positives.

Methodology and Key Contributions

The authors propose a novel network architecture that leverages an attention mechanism and multi-relation detection modules to match and differentiate between support and query image sets:

Attention-Based RPN: This component enhances the quality of proposals by incorporating support image information, effectively filtering background and non-target category proposals. This attention mechanism is implemented via depth-wise cross-correlation, which aligns closely with the target support features.
Multi-Relation Detector: This comprises three relation heads—global, local, and patch—that model different relationships between query and support proposals. The global head captures overall feature similarities, while local and patch heads focus on pixel and region-level interactions, respectively.
Contrastive Training Strategy: Unlike traditional methods using single-way training, this approach utilizes a triplet of query and support images of both similar and different categories to enhance the model's ability to distinguish between categories.
Few-Shot Object Detection Dataset (FSOD): The paper also introduces a new dataset containing 1000 categories, specifically designed to support few-shot learning objectives. This dataset offers significant diversity to train and evaluate models, setting a new standard for few-shot detection benchmarks.

Results and Evaluation

The proposed methodology achieves state-of-the-art performance across various datasets. In the ImageNet Detection dataset, remarkable improvements in $AP_{50}$ and $AP_{75}$ demonstrate the efficacy of the approach. The FSOD dataset further highlights the network's versatility and robustness, providing a benchmark for few-shot object detection with its extensive category diversity.

The empirical evaluation shows that the model, trained on FSOD, outperforms traditional fine-tuned models when applied directly to new datasets, often without further adjustment. This underlines the model's potent generalization capabilities, a core requirement for few-shot learning tasks.

Implications and Future Developments

The proposed framework presents significant practical implications for scenarios requiring flexible adaptation to new categories, such as automated visual inspection and dynamic inventory systems. The research opens avenues for exploration of further modular components like hybrid attention mechanisms and advanced relational embeddings.

The introduction of the FSOD dataset is particularly impactful, inviting further research to refine and extend existing methodologies. Future developments might focus on reducing model complexity while maintaining performance, or integrating complementary learning paradigms like meta-learning to enhance adaptability to new, unseen data domains.

Overall, this paper offers a substantial contribution to the field of few-shot learning, presenting an effective, adaptable solution for object detection in resource-constrained annotation environments.

Related Papers

GitHub

GitHub - fanq15/Few-Shot-Object-Detection-Dataset (380 stars)