Dynamic Refinement Network for Oriented and Densely Packed Object Detection (2005.09973v2)

Published 20 May 2020 in cs.CV

Abstract: Object detection has achieved remarkable progress in the past decade. However, the detection of oriented and densely packed objects remains challenging because of following inherent reasons: (1) receptive fields of neurons are all axis-aligned and of the same shape, whereas objects are usually of diverse shapes and align along various directions; (2) detection models are typically trained with generic knowledge and may not generalize well to handle specific objects at test time; (3) the limited dataset hinders the development on this task. To resolve the first two issues, we present a dynamic refinement network that consists of two novel components, i.e., a feature selection module (FSM) and a dynamic refinement head (DRH). Our FSM enables neurons to adjust receptive fields in accordance with the shapes and orientations of target objects, whereas the DRH empowers our model to refine the prediction dynamically in an object-aware manner. To address the limited availability of related benchmarks, we collect an extensive and fully annotated dataset, namely, SKU110K-R, which is relabeled with oriented bounding boxes based on SKU110K. We perform quantitative evaluations on several publicly available benchmarks including DOTA, HRSC2016, SKU110K, and our own SKU110K-R dataset. Experimental results show that our method achieves consistent and substantial gains compared with baseline approaches. The code and dataset are available at https://github.com/Anymake/DRN_CVPR2020.

Authors (8)

Xingjia Pan (9 papers)
Yuqiang Ren (4 papers)
Kekai Sheng (14 papers)
Weiming Dong (50 papers)
Haolei Yuan (5 papers)
Xiaowei Guo (26 papers)
Chongyang Ma (52 papers)
Changsheng Xu (101 papers)

Citations (256)

View on Semantic Scholar

Summary

The paper introduces a Dynamic Refinement Network (DRN) that improves detection accuracy by dynamically adjusting receptive fields for oriented and densely packed objects.
The methodology leverages two key components: a Feature Selection Module that adapts receptive fields and a Dynamic Refinement Head that refines classification and regression outputs.
Empirical evaluations on benchmarks like SKU110K-R and DOTA show significant gains in mean average precision compared to existing baseline models.

Dynamic Refinement Network for Oriented and Densely Packed Object Detection

The paper "Dynamic Refinement Network for Oriented and Densely Packed Object Detection" presents an innovative approach to address the challenges inherent in detecting oriented and densely packed objects. The primary challenges stem from standard object detection models' reliance on axis-aligned receptive fields, which do not adapt well to the diverse shapes and orientations of real-world objects. Additionally, these models often generalize poorly to specific objects during testing, a deficiency compounded by limited datasets available for training and evaluation.

Methodology Overview

To tackle these issues, the authors introduce a Dynamic Refinement Network (DRN), which encompasses two novel components: the Feature Selection Module (FSM) and the Dynamic Refinement Head (DRH).

Feature Selection Module (FSM): FSM addresses the misalignment problem between receptive fields and object orientations. It enables neurons to dynamically adjust their receptive fields based on object shapes and orientations, improving feature extraction efficacy. The module utilizes a variety of kernel shapes and rotation-invariant adjustments to tailor its receptive fields more accurately.
Dynamic Refinement Head (DRH): DRH allows for model adaptation at inference time by refining predictions based on specific object characteristics. This module is bifurcated into two separate adaptations: DRH-C for classification tasks and DRH-R for regression tasks. DRH-C focuses on enhancing the discriminability of feature embeddings, while DRH-R directly refines predicted values, offering a tailored approach for individual test samples.

Dataset and Evaluation

To supplement their methodology and support oriented detection, the authors provide a novel dataset: SKU110K-R, which extends the SKU110K dataset with precise oriented bounding box annotations. This dataset aids in training and evaluating models on the task of detecting tightly packed and oriented objects.

Quantitative evaluations conducted on several benchmarks, including DOTA, HRSC2016, SKU110K, and the proposed SKU110K-R dataset, reveal that the DRN achieves superior performance compared to existing baseline methods. Notably, the model demonstrates significant gains in mean average precision (mAP), especially in scenarios requiring robust orientation adaptability.

Implications and Future Directions

The implications of this research are substantial both in practical and theoretical dimensions. Practically, the DRN's refined predictive capabilities enable more accurate detections in real-world scenarios, such as aerial imagery and densely populated environments. Theoretically, the successful application of dynamic modelling suggests avenues for further research into receptive field adaptations and test-time model refinement strategies.

Looking ahead, this work paves the way for exploring dynamic refinement strategies within constrained data settings, such as limited datasets or few-shot learning scenarios. Furthermore, integrating such dynamic adaptability into other domains of computer vision and AI might unveil new capabilities for responsive, context-aware models.

In summary, the authors propose a comprehensive system that enhances the capacity of object detection frameworks to deal with the challenges presented by oriented and densely packed objects. The introduction of FSM and DRH drives both practical improvements in detection tasks and enriches the theoretical landscape, suggesting promising pathways for future inquiry into dynamic model architectures.

PDF Markdown

Related Papers

GitHub

GitHub - Anymake/DRN_CVPR2020: Code and Dataset for CVPR2020 "Dynamic Reﬁnement Network for Oriented and Densely Packed Object Detection" (335 stars)