PP-YOLOE-R: An Efficient Anchor-Free Rotated Object Detector
The paper presents PP-YOLOE-R, an efficient anchor-free rotated object detector extending the PP-YOLOE framework, specifically designed for arbitrary-oriented object detection in applications involving aerial images and scene text. The authors introduce several enhancements that optimize detection precision while only marginally increasing parameter count and computational cost.
Key Contributions
PP-YOLOE-R distinguishes itself with the following innovations:
- ProbIoU Loss: Inspired by FCOSR, the paper incorporates ProbIoU loss to mitigate boundary discontinuity issues inherent in direct angle regression. This approach models rotated bounding boxes as Gaussian distributions, offering a more stable optimization landscape compared to other methods.
- Rotated Task Alignment Learning: This technique adapts the Task Alignment Learning paradigm to the context of rotated object detection, utilizing SkewIoU for positive sample selection. This refined selection method enhances both training efficacy and detection precision.
- Decoupled Angle Prediction Head: A novel angle prediction head is introduced, separating angle prediction from other bounding box parameters. This specialization more accurately predicts object orientation by leveraging Distribution Focal Loss (DFL) for angle representation.
- Learnable Gating Unit in RepVGG: The insertion of a learnable gating unit within the RepVGG architecture enhances adaptive feature fusion. The training process adjusts the amount of information utilized from previous layers, which is particularly beneficial for detecting small or densely packed objects.
Evaluation and Results
The experimental evaluation utilizes the DOTA 1.0 dataset, a benchmark for aerial object detection. PP-YOLOE-R achieves impressive results, obtaining 78.14 and 78.28 mAP for the large (l) and extra-large (x) models, respectively, under single-scale conditions. These results surpass many existing anchor-free rotated object detectors. Further enhancing these results through multi-scale training and testing, the mAP scores rise to 80.02 for PP-YOLOE-R-l and 80.73 for PP-YOLOE-R-x. Importantly, PP-YOLOE-R-x demonstrates performance competitive with two-stage anchor-based models.
Performance efficiency is also emphasized, with PP-YOLOE-R models achieving real-time inference speeds, notably reaching up to 69.8 FPS on an RTX 2080 Ti with TensorRT and FP16 precision for the smallest model variant.
Implications and Future Work
The PP-YOLOE-R framework represents a significant advancement in rotated object detection, balancing detection precision and computational efficiency. Its compatibility with TensorRT facilitates easy deployment across various hardware platforms, making it particularly attractive for real-world applications requiring high throughput and accuracy.
Future work involves extending this research across more diverse datasets and application scenarios, potentially broadening the utility of PP-YOLOE-R in other domains requiring robust object orientation detection. Continued exploration of adaptive learning schemes and optimization techniques could further enhance the model's applicability and efficiency.