DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution (2006.02334v2)

Published 3 Jun 2020 in cs.CV

Abstract: Many modern object detectors demonstrate outstanding performances by using the mechanism of looking and thinking twice. In this paper, we explore this mechanism in the backbone design for object detection. At the macro level, we propose Recursive Feature Pyramid, which incorporates extra feedback connections from Feature Pyramid Networks into the bottom-up backbone layers. At the micro level, we propose Switchable Atrous Convolution, which convolves the features with different atrous rates and gathers the results using switch functions. Combining them results in DetectoRS, which significantly improves the performances of object detection. On COCO test-dev, DetectoRS achieves state-of-the-art 55.7% box AP for object detection, 48.5% mask AP for instance segmentation, and 50.0% PQ for panoptic segmentation. The code is made publicly available.

Authors (3)

Siyuan Qiao (40 papers)
Liang-Chieh Chen (66 papers)
Alan Yuille (294 papers)

Citations (671)

View on Semantic Scholar

Summary

An Analysis of DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution

The paper "DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution" explores advanced methodologies in object detection through enhancements in backbone design. The authors propose two significant innovations: Recursive Feature Pyramid (RFP) and Switchable Atrous Convolution (SAC).

Core Contributions

The paper introduces RFP and SAC at different levels of object detection architecture:

Recursive Feature Pyramid (RFP): This mechanism brings feedback connections from Feature Pyramid Networks (FPN) into the backbone, enhancing feature representation through recursion. The recursive process augments the model's ability to extract features by incorporating results from prior passes, akin to a recursive neural network design. This feedback loop resembles the human visual system's capacity to focus selectively to improve recognition.
Switchable Atrous Convolution (SAC): SAC is implemented to provide flexible field-of-view adjustments via dynamic atrous rates, controlled through switch functions. This method allows handling varying object scales more effectively without increasing the computational burden or model complexity.

Quantitative Results

DetectoRS demonstrates substantial improvements over baseline object detection frameworks:

On the COCO test-dev benchmark, it achieves a notable 55.7% box Average Precision (AP) for object detection, 48.5% mask AP for instance segmentation, and 50.0% Panoptic Quality (PQ) in panoptic segmentation.
In comparison to the HTC baseline, DetectoRS improves the box AP by 7.7% and mask AP by 5.9%, indicating its efficacy in enhancing detection accuracy.

Architectural Implications

The proposed RFP enriches the backbone by facilitating multiple passes over the input, leading to gradually refined features. The continuity of learning through feedback connections aligns with effective training methodologies similar to Deeply-Supervised Nets, improving convergence and model robustness.

Similarly, SAC enhances object detectors by embedding a mechanism to adjust receptive fields dynamically. The approach integrates seamlessly with pretrained models (e.g., ResNet variants), ensuring compatibility without retraining from scratch. This adaptability, along with the incorporation of global context information, maximizes SAC's utility in practical scenarios.

Practical and Theoretical Implications

Practical Implications: Using DetectoRS can significantly boost the performance of object detection tasks in applications where precision and scalability are pivotal, such as autonomous driving and complex surveillance systems.
Theoretical Implications: DetectoRS exemplifies the effectiveness of employing recursive structures and adaptive convolutions, paving the way for further innovations in recursive learning and dynamic feature extraction techniques.

Speculation on Future Developments

Looking forward, advancements in recursive architectures could explore deeper integrations with attention mechanisms and potentially benefit from unsupervised feedback loops akin to cognitive systems. The refinement of switchable convolutions could also delve into more nuanced, context-specific modulation of feature maps, leveraging fully learned rather than pre-set parameters.

In conclusion, "DetectoRS" introduces robust methodologies with demonstrable improvements in object detection. Continued research expanding on these concepts might refine detection capabilities across more diverse and challenging datasets.

PDF Markdown

Related Papers

Find Related Papers