Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN (1803.06799v3)

Published 19 Mar 2018 in cs.CV

Abstract: Recent region-based object detectors are usually built with separate classification and localization branches on top of shared feature extraction networks. In this paper, we analyze failure cases of state-of-the-art detectors and observe that most hard false positives result from classification instead of localization. We conjecture that: (1) Shared feature representation is not optimal due to the mismatched goals of feature learning for classification and localization; (2) multi-task learning helps, yet optimization of the multi-task loss may result in sub-optimal for individual tasks; (3) large receptive field for different scales leads to redundant context information for small objects.We demonstrate the potential of detector classification power by a simple, effective, and widely-applicable Decoupled Classification Refinement (DCR) network. DCR samples hard false positives from the base classifier in Faster RCNN and trains a RCNN-styled strong classifier. Experiments show new state-of-the-art results on PASCAL VOC and COCO without any bells and whistles.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Bowen Cheng (23 papers)
  2. Yunchao Wei (151 papers)
  3. Honghui Shi (22 papers)
  4. Rogerio Feris (105 papers)
  5. Jinjun Xiong (118 papers)
  6. Thomas Huang (48 papers)
Citations (197)

Summary

  • The paper introduces a Decoupled Classification Refinement (DCR) network that enhances classification accuracy in Faster RCNN by disentangling feature and optimization tasks.
  • It demonstrates that suboptimal feature sharing and fixed receptive fields cause hard false positives in object detection.
  • Empirical results on PASCAL VOC and COCO show significant mAP improvements, confirming the method’s practicality in modern detection frameworks.

Decoupled Classification Refinement in Object Detection

The paper "Revisiting RCNN: On Awakening the Classification Power of Faster RCNN" examines prevalent issues in state-of-the-art region-based object detectors, primarily focusing on Faster RCNN. The authors address the challenges related to classification accuracy, proposing a straightforward yet impactful Decoupled Classification Refinement (DCR) network to enhance detection performance. This summary explores the main findings and contributions of the paper, focusing on its theoretical implications and practical applications within the object detection landscape.

Region-based convolutional neural networks (CNNs), specifically the Faster RCNN, have made significant strides in object detection due to their efficiency and accuracy. However, the paper identifies that hard false positives in detection are primarily attributed to classification errors rather than localization issues. This discrepancy arises from several factors:

  1. Suboptimal Feature Sharing: The multitask nature of Faster RCNN, with shared features for classification and localization, presents inherent conflicts. The classification task benefits from translation-invariant features, whereas localization requires translation-covariant ones.
  2. Suboptimal Multitask Optimization: The shared optimization task can lead to suboptimal solutions for individual components. While multitask learning has shown advantages, it might not fully harness the potential of modern, powerful backbones like ResNets.
  3. Fixed Receptive Fields: Fixed receptive fields of deep CNNs in detectors can lead to suboptimal focus, especially for small objects, due to excessive background context that hampers classification accuracy.

The proposed DCR network builds on insights from the classic RCNN model, advocating for methodological improvements:

  • Decoupled Features: By separating the features used for classification and localization, DCR allows each task to fully leverage its specific requirements without interference.
  • Decoupled Optimization: DCR introduces a two-stage optimization process that treats classification and localization losses independently, ensuring focused refinement for each task.
  • Adaptive Receptive Fields: Adopting adaptive receptive fields by resizing region proposals ensures the network maintains attention on relevant object areas, minimizing superfluous context from backgrounds.

Empirical validation on the PASCAL VOC and COCO datasets demonstrates significant mAP improvements across various detector architectures, including Faster RCNN, Feature Pyramid Networks (FPN), and Deformable ConvNets (DCN). The authors showcase a reduction in hard false positives by nearly thrice, achieving new state-of-the-art results without intricate tuning or additional data augmentation. The modular nature of DCR allows integration into existing detection frameworks, providing consistent performance gains by solely focusing on refining classification accuracy.

The paper's contributions extend theoretical understanding and offer practical utility. By decoupling tasks, the research challenges current paradigms in object detection design, advocating for a more task-oriented architecture. The adaptive receptive field insight highlights an evolving area of paper, suggesting future exploration into dynamic network structures that better accommodate object variability.

For ongoing and future research, the significant performance gains attributed to DCR accentuate the importance of classification robustness in object detection pipelines. Further studies could explore the scalability of these principles to other model architectures, investigate dynamic feature sharing strategies, and pursue advancements in personalized object detection systems. Additionally, exploring more efficient implementations of adaptive receptive fields might unlock further performance enhancements while maintaining computational practicality.

In essence, revisiting and refining foundational principles in object detector design can lead to heightened classification performance, with broad applications in various computer vision tasks. The Decoupled Classification Refinement network, therefore, represents a crucial step toward more precise, efficient, and robust detection systems.