Learning RoI Transformer for Detecting Oriented Objects in Aerial Images (1812.00155v1)

Published 1 Dec 2018 in cs.CV

Abstract: Object detection in aerial images is an active yet challenging task in computer vision because of the birdview perspective, the highly complex backgrounds, and the variant appearances of objects. Especially when detecting densely packed objects in aerial images, methods relying on horizontal proposals for common object detection often introduce mismatches between the Region of Interests (RoIs) and objects. This leads to the common misalignment between the final object classification confidence and localization accuracy. Although rotated anchors have been used to tackle this problem, the design of them always multiplies the number of anchors and dramatically increases the computational complexity. In this paper, we propose a RoI Transformer to address these problems. More precisely, to improve the quality of region proposals, we first designed a Rotated RoI (RRoI) learner to transform a Horizontal Region of Interest (HRoI) into a Rotated Region of Interest (RRoI). Based on the RRoIs, we then proposed a Rotated Position Sensitive RoI Align (RPS-RoI-Align) module to extract rotation-invariant features from them for boosting subsequent classification and regression. Our RoI Transformer is with light weight and can be easily embedded into detectors for oriented object detection. A simple implementation of the RoI Transformer has achieved state-of-the-art performances on two common and challenging aerial datasets, i.e., DOTA and HRSC2016, with a neglectable reduction to detection speed. Our RoI Transformer exceeds the deformable Position Sensitive RoI pooling when oriented bounding-box annotations are available. Extensive experiments have also validated the flexibility and effectiveness of our RoI Transformer. The results demonstrate that it can be easily integrated with other detector architectures and significantly improve the performances.

Authors (5)

Jian Ding (132 papers)
Nan Xue (61 papers)
Yang Long (61 papers)
Gui-Song Xia (139 papers)
Qikai Lu (6 papers)

Citations (166)

View on Semantic Scholar

Summary

Learning RoI Transformer for Detecting Oriented Objects in Aerial Images

This paper presents a novel approach to object detection in aerial imagery by proposing the RoI Transformer, designed specifically to enhance the detection of oriented and densely packed objects. In aerial images often characterized by complex backgrounds and varying object orientations, traditional methods using horizontal proposals are inadequate due to frequent misalignments between the Region of Interest (RoI) and the objects themselves. Although rotated anchors have been applied previously, they suffer from increased computational costs due to the exponential growth in anchor numbers.

The RoI Transformer introduced in this work integrates a Rotated RoI (RRoI) learner, capable of transforming a Horizontal Region of Interest (HRoI) into an RRoI. This transformation facilitates better object localization and classification by extracting rotation-invariant features via the Rotated Position Sensitive RoI Align (RPS-RoI-Align) module. Additionally, the lightweight design of this transformer allows it to be embedded effectively within existing detectors without compromising detection speed.

Technical Contributions

The paper outlines three primary contributions of the RoI Transformer:

Supervised RRoI Learner: Employing a learnable module for transforming HRoIs to RRoIs, the design mitigates the misalignment issues prevalent in existing methods and obviates the need for a large number of rotated anchors.
Rotated Position Sensitive RoI Align: This module extracts spatially invariant features, thus aiding in more accurate object classification and location regression. Its efficiency is demonstrated when combined with the light-head RoI-wise operation, ensuring low computational complexity.
Enhanced Performance: Extensive experiments on the DOTA and HRSC2016 datasets have shown the RoI Transformer achieving state-of-the-art performance in oriented object detection. Furthermore, the RoI Transformer can be seamlessly integrated into other detector architectures to yield significant improvements in detection performance.

Results and Implications

The RoI Transformer demonstrated substantial improvements over existing methods. For instance, a notable enhancement in the accuracy and recall rate for detecting small and densely packed vehicle instances was observed. These results suggest potential applications in areas requiring precise aerial surveillance, such as urban planning and traffic monitoring, where object orientation and density are critical factors.

In theoretical terms, the RoI Transformer contributes to object detection methodologies by emphasizing the importance of handling orientation variations and feature extraction reliant on dense object regions. The straightforward integration of the transformer with contemporary object detectors indicates potential future investigations into more complex models requiring rotationally invariant features.

Future Directions

While the RoI Transformer provides a solid foundation for improved oriented object detection in aerial images, further research might explore optimizing RRoI learner frameworks and feature extraction mechanisms. Another avenue for enhancement could involve extending this work to incorporate temporal aspects in aerial video footage, potentially leveraging real-time dynamic RoI transformations.

In summary, this work represents a meaningful advancement in the field of aerial image detection, with practical implications for precision monitoring systems and theoretical contributions to adaptive learning techniques in computer vision.