Learning Modulated Loss for Rotated Object Detection (1911.08299v3)

Published 19 Nov 2019 in cs.CV

Abstract: Popular rotated detection methods usually use five parameters (coordinates of the central point, width, height, and rotation angle) to describe the rotated bounding box and l1-loss as the loss function. In this paper, we argue that the aforementioned integration can cause training instability and performance degeneration, due to the loss discontinuity resulted from the inherent periodicity of angles and the associated sudden exchange of width and height. This problem is further pronounced given the regression inconsistency among five parameters with different measurement units. We refer to the above issues as rotation sensitivity error (RSE) and propose a modulated rotation loss to dismiss the loss discontinuity. Our new loss is combined with the eight-parameter regression to further solve the problem of inconsistent parameter regression. Experiments show the state-of-art performances of our method on the public aerial image benchmark DOTA and UCAS-AOD. Its generalization abilities are also verified on ICDAR2015, HRSC2016, and FDDB. Qualitative improvements can be seen in Fig 1, and the source code will be released with the publication of the paper.

Authors (5)

Wen Qian (5 papers)
Xue Yang (141 papers)
Silong Peng (9 papers)
Yue Guo (29 papers)
Junchi Yan (241 papers)

Citations (295)

View on Semantic Scholar

Summary

Insights into "Learning Modulated Loss for Rotated Object Detection"

The paper "Learning Modulated Loss for Rotated Object Detection" introduces a novel approach to improving the training stability and detection accuracy in rotated object detection tasks. Rotated object detection is critical in various domains such as aerial imagery, scene text detection, and others where precise localization is required beyond simple axis-aligned bounding boxes.

Core Contributions

The authors identify a prevalent issue in rotated object detection referred to as Rotation Sensitivity Error (RSE). This error stems from two main problems: loss discontinuity and regression inconsistency in the widely used five-parameter representation of bounding boxes. This five-parameter system, encompassing the coordinates of the central point, width, height, and rotation angle, is susceptible to abrupt changes in the loss function due to angle periodicity and the interchange of width and height.

To mitigate these issues, the authors propose the following enhancements:

Modulated Rotation Loss: A new loss function $\ell_{mr}$ is developed to ensure continuity in the loss landscape by providing a mechanism to handle boundary cases arising from angular periodicity and width-height interchange ambiguities.
Eight-parameter Regression: The paper introduces a regression model using eight parameters — four corner coordinates of the bounding box. This system inherently maintains consistency in regression as all parameters have a uniform measurement unit. The modulated loss function is adapted for this representation to handle cases of vertex ordering challenges.

Experimental Validation

Comprehensive experiments are conducted on several prominent datasets, including DOTA, UCAS-AOD, ICDAR2015, and HRSC2016. The proposed RSDet method demonstrates state-of-the-art performance notably with an mAP boost on DOTA, indicating the relevance of addressing RSE in varied datasets. The analysis includes ablation studies underscoring the contribution of each component: modulated rotation loss, eight-parameter regression, data augmentation, and backbone architectures.

The results indicate increased training stability and improvements in detection accuracy for both five-parameter and eight-parameter models using the proposed techniques. The model, trained end-to-end, confirms the robustness and generalization capabilities of the proposed method across different benchmarks and detector architectures.

Implications and Future Directions

The introduction of a modulated loss framework and more consistent parameterization opens new avenues for the object detection community. It reduces the performance degradation linked to RSE, offering a significant improvement for applications requiring high precision in rotated object detection, such as mapping and surveillance.

Future research can be directed towards exploring more intuitive loss modulation strategies and the further customization of regression models to handle cases beyond simple quadrilateral detection. Additionally, integrating the proposed techniques with transformer-based architectures could leverage their potential in complex detection tasks.

In essence, this paper provides substantial evidence that addressing intrinsic issues such as RSE can drive substantial improvements in the performance of rotated object detection systems. This work not only achieves a practical performance increment but also contributes theoretically by highlighting and solving the overlooked complexities in model regression consistency and loss definition. These contributions advance the field of computer vision, paving the way for more reliable and efficient detection systems.

PDF Markdown