Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PP-YOLOE-R: An Efficient Anchor-Free Rotated Object Detector (2211.02386v1)

Published 4 Nov 2022 in cs.CV

Abstract: Arbitrary-oriented object detection is a fundamental task in visual scenes involving aerial images and scene text. In this report, we present PP-YOLOE-R, an efficient anchor-free rotated object detector based on PP-YOLOE. We introduce a bag of useful tricks in PP-YOLOE-R to improve detection precision with marginal extra parameters and computational cost. As a result, PP-YOLOE-R-l and PP-YOLOE-R-x achieve 78.14 and 78.28 mAP respectively on DOTA 1.0 dataset with single-scale training and testing, which outperform almost all other rotated object detectors. With multi-scale training and testing, PP-YOLOE-R-l and PP-YOLOE-R-x further improve the detection precision to 80.02 and 80.73 mAP. In this case, PP-YOLOE-R-x surpasses all anchor-free methods and demonstrates competitive performance to state-of-the-art anchor-based two-stage models. Further, PP-YOLOE-R is deployment friendly and PP-YOLOE-R-s/m/l/x can reach 69.8/55.1/48.3/37.1 FPS respectively on RTX 2080 Ti with TensorRT and FP16-precision. Source code and pre-trained models are available at https://github.com/PaddlePaddle/PaddleDetection, which is powered by https://github.com/PaddlePaddle/Paddle.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xinxin Wang (24 papers)
  2. Guanzhong Wang (34 papers)
  3. Qingqing Dang (15 papers)
  4. Yi Liu (543 papers)
  5. Xiaoguang Hu (18 papers)
  6. Dianhai Yu (37 papers)
Citations (15)

Summary

PP-YOLOE-R: An Efficient Anchor-Free Rotated Object Detector

The paper presents PP-YOLOE-R, an efficient anchor-free rotated object detector extending the PP-YOLOE framework, specifically designed for arbitrary-oriented object detection in applications involving aerial images and scene text. The authors introduce several enhancements that optimize detection precision while only marginally increasing parameter count and computational cost.

Key Contributions

PP-YOLOE-R distinguishes itself with the following innovations:

  1. ProbIoU Loss: Inspired by FCOSR, the paper incorporates ProbIoU loss to mitigate boundary discontinuity issues inherent in direct angle regression. This approach models rotated bounding boxes as Gaussian distributions, offering a more stable optimization landscape compared to other methods.
  2. Rotated Task Alignment Learning: This technique adapts the Task Alignment Learning paradigm to the context of rotated object detection, utilizing SkewIoU for positive sample selection. This refined selection method enhances both training efficacy and detection precision.
  3. Decoupled Angle Prediction Head: A novel angle prediction head is introduced, separating angle prediction from other bounding box parameters. This specialization more accurately predicts object orientation by leveraging Distribution Focal Loss (DFL) for angle representation.
  4. Learnable Gating Unit in RepVGG: The insertion of a learnable gating unit within the RepVGG architecture enhances adaptive feature fusion. The training process adjusts the amount of information utilized from previous layers, which is particularly beneficial for detecting small or densely packed objects.

Evaluation and Results

The experimental evaluation utilizes the DOTA 1.0 dataset, a benchmark for aerial object detection. PP-YOLOE-R achieves impressive results, obtaining 78.14 and 78.28 mAP for the large (l) and extra-large (x) models, respectively, under single-scale conditions. These results surpass many existing anchor-free rotated object detectors. Further enhancing these results through multi-scale training and testing, the mAP scores rise to 80.02 for PP-YOLOE-R-l and 80.73 for PP-YOLOE-R-x. Importantly, PP-YOLOE-R-x demonstrates performance competitive with two-stage anchor-based models.

Performance efficiency is also emphasized, with PP-YOLOE-R models achieving real-time inference speeds, notably reaching up to 69.8 FPS on an RTX 2080 Ti with TensorRT and FP16 precision for the smallest model variant.

Implications and Future Work

The PP-YOLOE-R framework represents a significant advancement in rotated object detection, balancing detection precision and computational efficiency. Its compatibility with TensorRT facilitates easy deployment across various hardware platforms, making it particularly attractive for real-world applications requiring high throughput and accuracy.

Future work involves extending this research across more diverse datasets and application scenarios, potentially broadening the utility of PP-YOLOE-R in other domains requiring robust object orientation detection. Continued exploration of adaptive learning schemes and optimization techniques could further enhance the model's applicability and efficiency.