R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object (1908.05612v6)

Published 15 Aug 2019 in cs.CV, cs.LG, and eess.IV

Abstract: Rotation detection is a challenging task due to the difficulties of locating the multi-angle objects and separating them effectively from the background. Though considerable progress has been made, for practical settings, there still exist challenges for rotating objects with large aspect ratio, dense distribution and category extremely imbalance. In this paper, we propose an end-to-end refined single-stage rotation detector for fast and accurate object detection by using a progressive regression approach from coarse to fine granularity. Considering the shortcoming of feature misalignment in existing refined single-stage detector, we design a feature refinement module to improve detection performance by getting more accurate features. The key idea of feature refinement module is to re-encode the position information of the current refined bounding box to the corresponding feature points through pixel-wise feature interpolation to realize feature reconstruction and alignment. For more accurate rotation estimation, an approximate SkewIoU loss is proposed to solve the problem that the calculation of SkewIoU is not derivable. Experiments on three popular remote sensing public datasets DOTA, HRSC2016, UCAS-AOD as well as one scene text dataset ICDAR2015 show the effectiveness of our approach. Tensorflow and Pytorch version codes are available at https://github.com/Thinklab-SJTU/R3Det_Tensorflow and https://github.com/SJTU-Thinklab-Det/r3det-on-mmdetection, and R3Det is also integrated in our open source rotation detection benchmark: https://github.com/yangxue0827/RotationDetection.

Citations (762)

View on Semantic Scholar

Summary

The paper presents a refined single-stage rotation detector that progressively refines predictions to handle arbitrary object orientations.
It introduces a novel feature refinement module that realigns bounding box features via interpolation to address misalignment issues.
The model achieves state-of-the-art performance on datasets such as DOTA, HRSC2016, UCAS-AOD, and ICDAR2015, demonstrating high accuracy in challenging conditions.

An Overview of R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object

The paper "R $^3$ Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object" presents an innovative approach to tackling the intricate challenges associated with rotating object detection. These challenges encompass issues such as large aspect ratio, dense distributions, and extreme category imbalance. Herein, the authors propose an end-to-end refined single-stage rotation detector based on a progressive regression approach which addresses these challenges by improving feature alignment and rotation estimation.

Key Contributions

The main contributions of this paper can be summarized as follows:

Refined Single-Stage Rotation Detector: The authors construct a single-stage detector that refines its predictions through multiple stages, progressively increasing the granularity from coarse to fine. This approach helps in efficiently handling objects with large aspect ratios and arbitrary orientations.
Feature Refinement Module (FRM): A novel feature refinement module is introduced to counteract the feature misalignment problem inherent in traditional single-stage detectors. This module re-encodes the positional information of refined bounding boxes into the corresponding feature maps using pixel-wise feature interpolation, thereby achieving feature reconstruction and alignment.
Approximate Skew Intersection over Union (SkewIoU) Loss: To enhance rotation estimation accuracy, an approximate SkewIoU loss is proposed. This addresses the non-differentiable nature of SkewIoU by approximating its gradient, providing more accurate rotation estimations.

Methodology

Rotation RetinaNet serves as the baseline framework, adapted for rotation detection by incorporating an additional angular offset in the regression subnetwork. The core regression loss leverages an approximate SkewIoU function to improve alignment with the evaluation metric.

Feature Refinement Module (FRM) tackles feature misalignment by reconstructing the feature map at each stage of refinement. This ensures that the features used for subsequent stages are well-aligned with the refined bounding boxes.

The authors systematically evaluate their model across various datasets:

DOTA: A comprehensive dataset for real-world aerial images.
HRSC2016: Focusing on ships with complex orientations.
UCAS-AOD: Inclusive of multiple object categories within aerial imagery.
ICDAR2015: Pertaining to scene text detection.

Experimental Results

The proposed R $^3$ Det achieves competitive performance across all aforementioned datasets, outperforming comparable methods in many cases. Key numerical highlights include:

DOTA: Achieves a state-of-the-art mean Average Precision (mAP) of 76.47% on the oriented bounding box (OBB) task, setting a new benchmark for single-stage detectors on this dataset.
HRSC2016: Exhibits significant advancements with an mAP of 96.01% utilizing a ResNet101 backbone.
UCAS-AOD: Scores a mAP of 96.17%, showcasing its effectiveness even in high-density object scenarios.
ICDAR2015: Delivers an impressive Hmean of 89.21% under high-resolution settings, laying the groundwork for general adaptability in scene text detection.

Implications and Future Directions

The methodological innovations introduced in this paper hold both practical and theoretical significance. Practically, the R $^3$ Det framework offers a robust and efficient tool for rotating object detection in various applications, including remote sensing, aerial surveillance, and text recognition. Theoretically, this work paves the way for further research into refined detection mechanisms and their potential adaptations across different detection paradigms.

Future avenues of development may involve:

Adaptive Refinement Strategies: Enhancing the refinement process with context-aware adaptations to further improve detection efficiency and accuracy.
Broader Applications: Extending this framework to other challenging detection scenarios, such as medical imaging where object orientation can be dynamically complex.
Real-Time Deployment: Optimizing the model's architecture for deployment on edge devices to facilitate real-time processing in practical applications.

In conclusion, the paper "R $^3$ Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object" contributes substantially to the field of object detection, particularly in handling the nuances associated with rotating objects. The collaborative integration of feature refinement and progressive regression represents a sophisticated approach geared towards achieving high-performance detection with high efficiency.

PDF Markdown