PP-YOLOE: An evolved version of YOLO (2203.16250v3)

Published 30 Mar 2022 in cs.CV

Abstract: In this report, we present PP-YOLOE, an industrial state-of-the-art object detector with high performance and friendly deployment. We optimize on the basis of the previous PP-YOLOv2, using anchor-free paradigm, more powerful backbone and neck equipped with CSPRepResStage, ET-head and dynamic label assignment algorithm TAL. We provide s/m/l/x models for different practice scenarios. As a result, PP-YOLOE-l achieves 51.4 mAP on COCO test-dev and 78.1 FPS on Tesla V100, yielding a remarkable improvement of (+1.9 AP, +13.35% speed up) and (+1.3 AP, +24.96% speed up), compared to the previous state-of-the-art industrial models PP-YOLOv2 and YOLOX respectively. Further, PP-YOLOE inference speed achieves 149.2 FPS with TensorRT and FP16-precision. We also conduct extensive experiments to verify the effectiveness of our designs. Source code and pre-trained models are available at https://github.com/PaddlePaddle/PaddleDetection.

PDF Abstract

Overview of PP-YOLOE: An Evolved Version of YOLO

The paper introduces PP-YOLOE, an advanced object detection model that builds upon the previous iteration, PP-YOLOv2. This work aims to enhance both performance and deployment efficiency of object detectors, leveraging an anchor-free paradigm along with a more powerful backbone and neck architecture. It incorporates CSPRepResStage, ET-head, and the TAL dynamic label assignment algorithm. Four models with varying sizes are proposed to accommodate diverse practical scenarios.

Key Innovations

Anchor-Free Design: The model transitions from an anchor-based to an anchor-free design, reducing the number of hyper-parameters and enhancing the generalization capability across datasets. This change slightly decreases AP by 0.3%, attributed to minor inconsistencies between anchor-based and anchor-free methods.
Backbone and Neck Architecture: The introduction of the CSPRepResNet, which integrates residual and dense connections, offers improvements in computational efficiency and accuracy. The RepResBlock, derived from TreeNet and VoVNet structures, re-parameterizes into a basic residual block during inference, enhancing overall model performance.
Task Alignment Learning (TAL): The TAL algorithm refines the label assignment strategy, dynamically allocating positive anchors based on predictions. It outperforms traditional methods like ATSS and SimOTA, achieving an AP increase of 0.9%.
Efficient Task-Aligned Head (ET-head): This component addresses task conflicts between classification and localization, improving the model's accuracy by aligning classification branches and incorporating VFL and DFL loss functions. ET-head adds 0.9ms latency but provides significant gains in precision.

Numerical Results and Comparisons

PP-YOLOE-l demonstrates a marked improvement, achieving 51.4 mAP on the COCO test-dev set and 78.1 FPS on a Tesla V100. This presents a considerable enhancement over PP-YOLOv2 and YOLOX, with PP-YOLOE outperforming these models by 1.9% and 1.3% AP, respectively. Furthermore, the inference speed on TensorRT reaches 149.2 FPS with FP16 precision.

Theoretical and Practical Implications

The advancements in PP-YOLOE demonstrate substantial progress in the object detection domain, particularly for real-time applications. By employing anchor-free techniques and restructured network designs, the work provides a well-rounded solution that addresses both speed and precision, suitable for deployment on devices with varying computational capabilities.

Future Directions

The paper opens avenues for further research in object detection architectures, particularly in optimizing the trade-offs between efficiency and accuracy. Potential expansions could explore additional strategies for dynamic label assignment and enhanced feature aggregation techniques.

Researchers focusing on deployment-oriented models can build on this foundation, investigating hardware-specific optimizations and the extension of these methodologies to other domains, such as semantic segmentation or instance segmentation, utilizing similar architectures and training methodologies.

In summary, PP-YOLOE represents a significant step in evolving YOLO-based object detectors, providing a robust framework for both academic exploration and industrial application. The detailed evaluation and thorough comparisons outlined in the paper solidify its contribution to the ongoing development of efficient, high-performing object detection systems.

PDF Markdown Bookmark Chat (Pro)

Authors (11)

Shangliang Xu (3 papers)
Xinxin Wang (24 papers)
Wenyu Lv (8 papers)
Qinyao Chang (4 papers)
Cheng Cui (15 papers)
Kaipeng Deng (4 papers)
Guanzhong Wang (34 papers)
Qingqing Dang (15 papers)
Shengyu Wei (3 papers)
Yuning Du (25 papers)
Baohua Lai (11 papers)

Citations (218)

View on Semantic Scholar

PP-YOLOE: An evolved version of YOLO (2203.16250v3)

Overview of PP-YOLOE: An Evolved Version of YOLO

Key Innovations

Numerical Results and Comparisons

Theoretical and Practical Implications

Future Directions

Related Papers

GitHub

YouTube