Few-shot Object Counting and Detection
The paper presents an innovative approach to the novel task of Few-Shot Object Counting and Detection (FSCD). This task involves counting and detecting all objects of a target class within an image, given only a few exemplar bounding boxes. This problem is positioned at the intersection of Few-Shot Object Counting (FSC) and Few-Shot Object Detection (FSOD), but introduces additional complexities by requiring both counting and bounding box detection despite limited supervision.
Key Contributions
- Two-Stage Training Strategy: The authors propose a refined two-stage training process. The first stage focuses on generating pseudo ground-truth bounding boxes from dot annotations using a novel few-shot object detector named Counting-DETR. This accounts for the sparse annotation, where bounding boxes for all objects are not readily available. The second stage fine-tunes the detector using these generated pseudo-labels, considering the potential imperfections of the labels.
- Counting-DETR: This uncertainty-aware object detector is designed to address the FSCD challenge. Built upon the Anchor DETR framework, Counting-DETR integrates mechanisms to handle the imperfect pseudo-labels by estimating and leveraging uncertainty during training. This improves robustness and accuracy in bounding box predictions.
- New Datasets: To evaluate the proposed approach, two datasets, FSCD-147 and FSCD-LVIS, are introduced. FSCD-147 extends the existing FSC-147 dataset by providing bounding box annotations for the validation and test sets. FSCD-LVIS, derived from the LVIS dataset, presents a more complex environment with multiple object classes and significant variation in object appearance. These datasets serve as benchmarks for FSCD tasks.
Experimental Results
The approach is demonstrated to outperform several strong baselines adapted from FSC and FSOD tasks on both FSCD-147 and FSCD-LVIS datasets. Comparative analyses show superior performance of Counting-DETR on detection and accuracy metrics compared to baseline approaches, evidencing the effectiveness of the proposed methods.
- Performance Metrics: The results highlight notable improvements in key performance metrics such as Mean Absolute Error (MAE) for counting tasks and Average Precision (AP) for detection tasks. On FSCD-147, Counting-DETR yields better interpretability and accuracy due to its dual capacity for bounding box generation and object enumeration.
- Impact of Components: Ablative studies underscore the contributions of pseudo ground-truth generation and uncertainty estimation. These components are pivotal in enhancing detection accuracy and robustness in densely packed and complex scenes typical of FSCD-LVIS.
Implications and Future Work
The research presented in this paper opens new avenues for both theoretical exploration and practical application in object detection and counting tasks with minimal supervision. The integration of uncertainty estimation within the two-stage training paradigm sets a precedent for improving model performance in few-shot learning environments.
Future directions could involve exploring alternative methods for pseudo-label generation, potentially incorporating semi-supervised learning techniques to further improve model generalization. Additionally, extending this approach to scenarios with dynamic changes or video streams could broaden its applicability in real-time detection systems.
In conclusion, the authors effectively address a sophisticated problem at the crossroads of counting and detection under low supervision. Through methodological innovations and comprehensive experimentation, this work significantly contributes to the advancement of few-shot learning methodologies.