Dense Label Encoding for Boundary Discontinuity Free Rotation Detection (2011.09670v4)

Published 19 Nov 2020 in cs.CV and cs.AI

Abstract: Rotation detection serves as a fundamental building block in many visual applications involving aerial image, scene text, and face etc. Differing from the dominant regression-based approaches for orientation estimation, this paper explores a relatively less-studied methodology based on classification. The hope is to inherently dismiss the boundary discontinuity issue as encountered by the regression-based detectors. We propose new techniques to push its frontier in two aspects: i) new encoding mechanism: the design of two Densely Coded Labels (DCL) for angle classification, to replace the Sparsely Coded Label (SCL) in existing classification-based detectors, leading to three times training speed increase as empirically observed across benchmarks, further with notable improvement in detection accuracy; ii) loss re-weighting: we propose Angle Distance and Aspect Ratio Sensitive Weighting (ADARSW), which improves the detection accuracy especially for square-like objects, by making DCL-based detectors sensitive to angular distance and object's aspect ratio. Extensive experiments and visual analysis on large-scale public datasets for aerial images i.e. DOTA, UCAS-AOD, HRSC2016, as well as scene text dataset ICDAR2015 and MLT, show the effectiveness of our approach. The source code is available at https://github.com/Thinklab-SJTU/DCL_RetinaNet_Tensorflow and is also integrated in our open source rotation detection benchmark: https://github.com/yangxue0827/RotationDetection.

Authors (5)

Xue Yang (141 papers)
Liping Hou (4 papers)
Yue Zhou (130 papers)
Wentao Wang (47 papers)
Junchi Yan (241 papers)

Citations (218)

View on Semantic Scholar

Summary

The paper introduces Dense Label Encoding by using Binary and Gray Coded Labels to reduce training time significantly.
It presents ADARSW, a novel weighting strategy that adjusts for angle distance and aspect ratios to enhance detection of square-like objects.
Experimental results on datasets such as DOTA and ICDAR2015 show improved accuracy and efficiency over traditional regression approaches.

Dense Label Encoding for Boundary Discontinuity Free Rotation Detection

This paper presents a paper on enhancing rotation detection in computer vision applications, particularly focusing on aerial images, scene text, and facial recognition, where the precise estimation of object orientation is crucial. It diverges from traditional regression-based methods, which often encounter boundary discontinuity issues, by exploring classification-based techniques. This work introduces innovative methodologies to advance rotation detection through the use of Densely Coded Labels (DCL) and a novel loss weighting strategy.

Key Contributions

The paper makes significant contributions in two principal areas:

Dense Label Encoding: The authors propose the use of DCL, specifically Binary Coded Label (BCL) and Gray Coded Label (GCL), to improve the efficiency of angle classification. These dense encodings replace traditional Sparsely Coded Labels (SCLs) and offer a significant reduction in training time, reportedly decreasing it by three times, while maintaining or enhancing detection accuracy. The dense labels enable more compact representation, addressing the limitations of classification layers in existing models.
Angle Distance and Aspect Ratio Sensitive Weighting (ADARSW): This novel weighting mechanism increases sensitivity to angular distance and object aspect ratios, particularly benefitting the detection of objects with square-like shapes. The ADARSW method dynamically adapts according to the aspect ratio, overcoming limitations identified in previous methods using the long-side definition of bounding boxes.

Experimental Validation

The performance of the proposed methods is evaluated across various datasets, including DOTA, UCAS-AOD, HRSC2016, ICDAR2015, and MLT. The experiments demonstrate the enhanced accuracy and efficiency of the proposed method compared to traditional regression-based and CSL-based detectors. Notably, detection accuracy improved by approximately 1.2% (from 76.17% to 77.37%) on the DOTA dataset, according to the mAP metric.

Implications and Future Directions

This research presents a potentially impactful shift in how rotation detection is approached, moving towards classification-based methods that eliminate boundary discontinuity. The proposed DCL effectively balances computational efficiency with detection precision, making it a promising baseline for future models.

The ADARSW further adds to the model's robustness by addressing complications with square-like objects. This approach could lead to more accurate systems in practical applications such as aerial monitoring, autonomous navigation, and text recognition in various orientations.

Future research could explore further optimization of the encoding schemes, potentially leveraging neural network architectures that inherently handle angle periodicity and object aspect ratio variations in real time. Additionally, integrating these techniques with more complex data scenarios, such as multi-lingual scene texts and variable environmental conditions, could extend their applicability.

Overall, the paper provides an insightful contribution to overcoming key challenges in rotation detection, with robust methodologies that promise to facilitate advancements in various computer vision applications.

PDF Markdown

Related Papers

GitHub

GitHub - yangxue0827/RotationDetection: This is a tensorflow-based rotation detection benchmark, also called AlphaRotate. (1,095 stars)