Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Frustratingly Simple Few-Shot Object Detection (2003.06957v1)

Published 16 Mar 2020 in cs.CV

Abstract: Detecting rare objects from a few examples is an emerging problem. Prior works show meta-learning is a promising approach. But, fine-tuning techniques have drawn scant attention. We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task. Such a simple approach outperforms the meta-learning methods by roughly 2~20 points on current benchmarks and sometimes even doubles the accuracy of the prior methods. However, the high variance in the few samples often leads to the unreliability of existing benchmarks. We revise the evaluation protocols by sampling multiple groups of training examples to obtain stable comparisons and build new benchmarks based on three datasets: PASCAL VOC, COCO and LVIS. Again, our fine-tuning approach establishes a new state of the art on the revised benchmarks. The code as well as the pretrained models are available at https://github.com/ucbdrive/few-shot-object-detection.

Frustratingly Simple Few-Shot Object Detection

This paper, "Frustratingly Simple Few-Shot Object Detection," presents a critical examination of the applicability of fine-tuning techniques in the context of few-shot object detection, significantly contrasting with the traditionally favored meta-learning approaches. The authors highlight a pivotal discovery: fine-tuning merely the last layer of existing object detectors on rare classes can outperform meta-learning methods by 2-20 points on current benchmarks, occasionally doubling the accuracy of previous methods.

Key Contributions

  • Fine-Tuning Methodology: The paper introduces a two-stage approach that first trains a complete object detector, such as Faster R-CNN, on base classes. Subsequently, only the final layers are fine-tuned on a balanced subset, encompassing both base and novel classes. This results in improved generalization to novel classes while retaining performance on base classes.
  • Evaluation Revisions: The authors identify critical issues in existing evaluation protocols, noting the high variance due to limited sample sizes, leading to unreliable comparisons. They propose revised protocols, including multiple runs with distinct training samples for stable accuracy estimations, applied to new benchmarks derived from datasets like PASCAL VOC, COCO, and LVIS.
  • Numerical Performance: On revised benchmarks, their approach established new state-of-the-art results, improving rare class precision on the LVIS dataset by ~4 points and common classes by ~2 points, with negligible loss for frequent classes.

Comparative Analysis

The research methodically compares its approach with previous meta-learning-based methods (e.g., FSRW, Meta R-CNN, MetaDet) by demonstrating superior performance in various few-shot detection tasks. Notably, the introduction of instance-level feature normalization inspired by existing work in few-shot classification contributed significantly to performance improvements.

Implications and Speculations

The implications of this work are twofold. Practically, it offers a more computationally efficient and straightforward alternative in few-shot object detection by leveraging existing detectors through selective fine-tuning. Theoretically, it challenges the prevailing notion that sophisticated meta-learning is invariably superior for few-shot tasks, prompting reconsideration of model complexity versus efficacy in novel class detection.

Future Directions

Given the transformative results presented, future work can explore the integration of this fine-tuning approach with other sophisticated techniques to further enhance few-shot learning capabilities. Additionally, investigating the framework's applicability to broader AI challenges, such as real-time detection tasks or integration with edge computing devices, could yield significant advancements. Moreover, extending this paper to include diverse domain-specific datasets can further validate the method's robustness and adaptability.

In conclusion, this paper provides a compelling argument for re-evaluating the role of simplicity and selective fine-tuning in developing effective few-shot object detection methods, defying the expectation that complexity and novelty necessarily correlate with performance improvements.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xin Wang (1307 papers)
  2. Thomas E. Huang (7 papers)
  3. Trevor Darrell (324 papers)
  4. Joseph E. Gonzalez (167 papers)
  5. Fisher Yu (104 papers)
Citations (496)
Github Logo Streamline Icon: https://streamlinehq.com