Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Few-shot Object Counting and Detection (2207.10988v2)

Published 22 Jul 2022 in cs.CV

Abstract: We tackle a new task of few-shot object counting and detection. Given a few exemplar bounding boxes of a target object class, we seek to count and detect all objects of the target class. This task shares the same supervision as the few-shot object counting but additionally outputs the object bounding boxes along with the total object count. To address this challenging problem, we introduce a novel two-stage training strategy and a novel uncertainty-aware few-shot object detector: Counting-DETR. The former is aimed at generating pseudo ground-truth bounding boxes to train the latter. The latter leverages the pseudo ground-truth provided by the former but takes the necessary steps to account for the imperfection of pseudo ground-truth. To validate the performance of our method on the new task, we introduce two new datasets named FSCD-147 and FSCD-LVIS. Both datasets contain images with complex scenes, multiple object classes per image, and a huge variation in object shapes, sizes, and appearance. Our proposed approach outperforms very strong baselines adapted from few-shot object counting and few-shot object detection with a large margin in both counting and detection metrics. The code and models are available at https://github.com/VinAIResearch/Counting-DETR.

Few-shot Object Counting and Detection

The paper presents an innovative approach to the novel task of Few-Shot Object Counting and Detection (FSCD). This task involves counting and detecting all objects of a target class within an image, given only a few exemplar bounding boxes. This problem is positioned at the intersection of Few-Shot Object Counting (FSC) and Few-Shot Object Detection (FSOD), but introduces additional complexities by requiring both counting and bounding box detection despite limited supervision.

Key Contributions

  1. Two-Stage Training Strategy: The authors propose a refined two-stage training process. The first stage focuses on generating pseudo ground-truth bounding boxes from dot annotations using a novel few-shot object detector named Counting-DETR. This accounts for the sparse annotation, where bounding boxes for all objects are not readily available. The second stage fine-tunes the detector using these generated pseudo-labels, considering the potential imperfections of the labels.
  2. Counting-DETR: This uncertainty-aware object detector is designed to address the FSCD challenge. Built upon the Anchor DETR framework, Counting-DETR integrates mechanisms to handle the imperfect pseudo-labels by estimating and leveraging uncertainty during training. This improves robustness and accuracy in bounding box predictions.
  3. New Datasets: To evaluate the proposed approach, two datasets, FSCD-147 and FSCD-LVIS, are introduced. FSCD-147 extends the existing FSC-147 dataset by providing bounding box annotations for the validation and test sets. FSCD-LVIS, derived from the LVIS dataset, presents a more complex environment with multiple object classes and significant variation in object appearance. These datasets serve as benchmarks for FSCD tasks.

Experimental Results

The approach is demonstrated to outperform several strong baselines adapted from FSC and FSOD tasks on both FSCD-147 and FSCD-LVIS datasets. Comparative analyses show superior performance of Counting-DETR on detection and accuracy metrics compared to baseline approaches, evidencing the effectiveness of the proposed methods.

  • Performance Metrics: The results highlight notable improvements in key performance metrics such as Mean Absolute Error (MAE) for counting tasks and Average Precision (AP) for detection tasks. On FSCD-147, Counting-DETR yields better interpretability and accuracy due to its dual capacity for bounding box generation and object enumeration.
  • Impact of Components: Ablative studies underscore the contributions of pseudo ground-truth generation and uncertainty estimation. These components are pivotal in enhancing detection accuracy and robustness in densely packed and complex scenes typical of FSCD-LVIS.

Implications and Future Work

The research presented in this paper opens new avenues for both theoretical exploration and practical application in object detection and counting tasks with minimal supervision. The integration of uncertainty estimation within the two-stage training paradigm sets a precedent for improving model performance in few-shot learning environments.

Future directions could involve exploring alternative methods for pseudo-label generation, potentially incorporating semi-supervised learning techniques to further improve model generalization. Additionally, extending this approach to scenarios with dynamic changes or video streams could broaden its applicability in real-time detection systems.

In conclusion, the authors effectively address a sophisticated problem at the crossroads of counting and detection under low supervision. Through methodological innovations and comprehensive experimentation, this work significantly contributes to the advancement of few-shot learning methodologies.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Thanh Nguyen (70 papers)
  2. Chau Pham (15 papers)
  3. Khoi Nguyen (35 papers)
  4. Minh Hoai (48 papers)
Citations (35)
Github Logo Streamline Icon: https://streamlinehq.com