Gliding vertex on the horizontal bounding box for multi-oriented object detection (1911.09358v2)

Published 21 Nov 2019 in cs.CV

Abstract: Object detection has recently experienced substantial progress. Yet, the widely adopted horizontal bounding box representation is not appropriate for ubiquitous oriented objects such as objects in aerial images and scene texts. In this paper, we propose a simple yet effective framework to detect multi-oriented objects. Instead of directly regressing the four vertices, we glide the vertex of the horizontal bounding box on each corresponding side to accurately describe a multi-oriented object. Specifically, We regress four length ratios characterizing the relative gliding offset on each corresponding side. This may facilitate the offset learning and avoid the confusion issue of sequential label points for oriented objects. To further remedy the confusion issue for nearly horizontal objects, we also introduce an obliquity factor based on area ratio between the object and its horizontal bounding box, guiding the selection of horizontal or oriented detection for each object. We add these five extra target variables to the regression head of faster R-CNN, which requires ignorable extra computation time. Extensive experimental results demonstrate that without bells and whistles, the proposed method achieves superior performances on multiple multi-oriented object detection benchmarks including object detection in aerial images, scene text detection, pedestrian detection in fisheye images.

Authors (7)

Yongchao Xu (43 papers)
Mingtao Fu (1 paper)
Qimeng Wang (11 papers)
Yukang Wang (5 papers)
Kai Chen (512 papers)
Gui-Song Xia (140 papers)
Xiang Bai (222 papers)

Citations (589)

View on Semantic Scholar

Summary

The paper introduces a gliding vertex representation that refines horizontal bounding boxes using offset ratios instead of directly regressing object vertices.
It employs an obliquity factor to distinguish between horizontal and rotated detections, enhancing accuracy for objects with near-horizontal orientations.
Integration with Faster R-CNN demonstrates improved mAP on datasets like DOTA and HRSC2016 with minimal computational overhead.

Gliding Vertex on the Horizontal Bounding Box for Multi-Oriented Object Detection

The paper addresses the limitations of traditional object detection methods when dealing with multi-oriented objects, commonly encountered in fields such as aerial imaging and scene text recognition. Traditional approaches using horizontal bounding boxes often fail to provide the necessary orientation and scale details needed for accurate detection of these objects. The authors present a novel framework that extends the Faster R-CNN model to more effectively detect multi-oriented objects by introducing a new representation termed the "gliding vertex."

Key Contributions

Gliding Vertex Representation: The authors propose a methodology where instead of directly regressing the four vertices of an object, they glide the vertices along the sides of a horizontal bounding box. This new approach uses four length ratios to indicate offsets, helping to accurately describe the object's orientation without being sensitive to angle prediction errors.
Obliquity Factor: To address the confusion when detecting nearly horizontal objects, the authors introduce an obliquity factor— the ratio of the area between the object and its horizontal bounding box. This guides the decision whether to use horizontal or oriented detection, improving prediction accuracy.
Integration with Faster R-CNN: The proposed method integrates seamlessly into the Faster R-CNN architecture with minimal computational overhead by adding five additional target variables to the regression head.

Experimental Evaluation

The proposed method demonstrates superior performance on multiple datasets, including those for aerial images, scene texts, and fisheye pedestrian detection. Specifically, the framework shows significant mAP improvements over state-of-the-art methods on the DOTA and HRSC2016 datasets, achieving 75.02% mAP using FPN on DOTA compared to the next best method's score of 71.16%.

In text detection scenarios, such as MSRA-TD500 and RCTW-17, it outperforms existing methods in precision, recall, and F-measure, highlighting its robustness in detecting long and multi-oriented text lines.

Implications and Future Directions

The introduction of the gliding vertex method demonstrates a practical enhancement to traditional object detectors without substantial computational trade-offs. Practically, this can lead to improvements in various applications that require precise object orientation details, such as autonomous driving, visual surveillance, and advanced image analysis in remote sensing.

Theoretically, this paper emphasizes the need for new object representation techniques in convolutional neural network-based detectors, suggesting a potential research direction towards more adaptive and rotation-invariant models. Future work could explore the integration of the proposed method with advanced feature refinement modules, further enhancing detection accuracy and efficiency.

In summary, this paper contributes an innovative approach to multi-oriented object detection, challenging the prevalent reliance on angular predictions in rotated bounding boxes and offering an efficient, simple alternative that is versatile across various object detection tasks.

PDF Markdown