- The paper introduces a gliding vertex representation that refines horizontal bounding boxes using offset ratios instead of directly regressing object vertices.
- It employs an obliquity factor to distinguish between horizontal and rotated detections, enhancing accuracy for objects with near-horizontal orientations.
- Integration with Faster R-CNN demonstrates improved mAP on datasets like DOTA and HRSC2016 with minimal computational overhead.
Gliding Vertex on the Horizontal Bounding Box for Multi-Oriented Object Detection
The paper addresses the limitations of traditional object detection methods when dealing with multi-oriented objects, commonly encountered in fields such as aerial imaging and scene text recognition. Traditional approaches using horizontal bounding boxes often fail to provide the necessary orientation and scale details needed for accurate detection of these objects. The authors present a novel framework that extends the Faster R-CNN model to more effectively detect multi-oriented objects by introducing a new representation termed the "gliding vertex."
Key Contributions
- Gliding Vertex Representation: The authors propose a methodology where instead of directly regressing the four vertices of an object, they glide the vertices along the sides of a horizontal bounding box. This new approach uses four length ratios to indicate offsets, helping to accurately describe the object's orientation without being sensitive to angle prediction errors.
- Obliquity Factor: To address the confusion when detecting nearly horizontal objects, the authors introduce an obliquity factor— the ratio of the area between the object and its horizontal bounding box. This guides the decision whether to use horizontal or oriented detection, improving prediction accuracy.
- Integration with Faster R-CNN: The proposed method integrates seamlessly into the Faster R-CNN architecture with minimal computational overhead by adding five additional target variables to the regression head.
Experimental Evaluation
The proposed method demonstrates superior performance on multiple datasets, including those for aerial images, scene texts, and fisheye pedestrian detection. Specifically, the framework shows significant mAP improvements over state-of-the-art methods on the DOTA and HRSC2016 datasets, achieving 75.02% mAP using FPN on DOTA compared to the next best method's score of 71.16%.
In text detection scenarios, such as MSRA-TD500 and RCTW-17, it outperforms existing methods in precision, recall, and F-measure, highlighting its robustness in detecting long and multi-oriented text lines.
Implications and Future Directions
The introduction of the gliding vertex method demonstrates a practical enhancement to traditional object detectors without substantial computational trade-offs. Practically, this can lead to improvements in various applications that require precise object orientation details, such as autonomous driving, visual surveillance, and advanced image analysis in remote sensing.
Theoretically, this paper emphasizes the need for new object representation techniques in convolutional neural network-based detectors, suggesting a potential research direction towards more adaptive and rotation-invariant models. Future work could explore the integration of the proposed method with advanced feature refinement modules, further enhancing detection accuracy and efficiency.
In summary, this paper contributes an innovative approach to multi-oriented object detection, challenging the prevalent reliance on angular predictions in rotated bounding boxes and offering an efficient, simple alternative that is versatile across various object detection tasks.