PP-YOLOv2: A Practical Object Detector (2104.10419v1)

Published 21 Apr 2021 in cs.CV

Abstract: Being effective and efficient is essential to an object detector for practical use. To meet these two concerns, we comprehensively evaluate a collection of existing refinements to improve the performance of PP-YOLO while almost keep the infer time unchanged. This paper will analyze a collection of refinements and empirically evaluate their impact on the final model performance through incremental ablation study. Things we tried that didn't work will also be discussed. By combining multiple effective refinements, we boost PP-YOLO's performance from 45.9% mAP to 49.5% mAP on COCO2017 test-dev. Since a significant margin of performance has been made, we present PP-YOLOv2. In terms of speed, PP-YOLOv2 runs in 68.9FPS at 640x640 input size. Paddle inference engine with TensorRT, FP16-precision, and batch size = 1 further improves PP-YOLOv2's infer speed, which achieves 106.5 FPS. Such a performance surpasses existing object detectors with roughly the same amount of parameters (i.e., YOLOv4-CSP, YOLOv5l). Besides, PP-YOLOv2 with ResNet101 achieves 50.3% mAP on COCO2017 test-dev. Source code is at https://github.com/PaddlePaddle/PaddleDetection.

PDF Abstract

Overview of PP-YOLOv2: Enhancements in Practical Object Detection

The paper "PP-YOLOv2: A Practical Object Detector" introduces a series of refinements to the PP-YOLO architecture aimed at improving the balance between accuracy and efficiency in object detection tasks. This research undertakes a meticulous ablation paper to assess the impact of various enhancements on model performance, ultimately achieving notable improvements on the COCO2017 benchmark.

Key Contributions and Methodological Improvements

Performance and Efficiency:
- The paper reports an increase in mean Average Precision (mAP) from 45.9% to 49.5% on the COCO2017 test-dev set while maintaining a speed of 68.9 FPS at an input size of 640x640.
- By leveraging the Paddle inference engine optimized with TensorRT and FP16-precision, PP-YOLOv2 achieves an enhanced inference speed of 106.5 FPS, outperforming counterparts such as YOLOv4-CSP and YOLOv5l with similar parameter counts.
Enhanced Techniques:
- Various refinements were introduced, such as the Path Aggregation Network (PAN) and the Mish activation function, deployed specifically in the detection neck rather than the backbone.
- The input resolution was increased to benefit performance with larger objects, balancing memory constraints by adjusting batch sizes accordingly.
- A revised IoU Aware Branch improved effectiveness by modifying the loss function to better align with detection goals.
Comparative Analysis:
- Comprehensive evaluation against state-of-the-art models confirms PP-YOLOv2's superior performance in speed and accuracy.
- The introduction of ResNet101 as a backbone further enhances PP-YOLOv2’s capabilities, yielding comparable results to YOLOv5x while maintaining superior processing speed.

Experimental Validation

Experiments conducted on the COCO dataset validate the proposed refinements through an in-depth ablation paper. Each modification was incrementally applied to demonstrate its contribution to the overall model improvement:

Utilizing PAN and Mish in combination elevated mAP to 47.1%.
Increasing input size further raised performance to 49.1% mAP.
Adjusting the IoU Aware Branch finalized performance gains to 49.5% mAP without compromising speed.

Implications and Future Directions

The advancements detailed in this paper offer significant implications for the deployment of object detection models in real-world applications where resource constraints and processing speed are critical. By effectively balancing these factors, PP-YOLOv2 provides a robust framework for further exploration and application.

Looking forward, the integration of such model improvements within industry-scale frameworks, as shown through PaddlePaddle, facilitates smoother transitions from development to deployment. Future work might well focus on continuing this trend, optimizing existing architectures for better efficiency, and testing across a more diverse set of conditions to ensure broader applicability.

Conclusion

PP-YOLOv2 represents an evolved step in object detection, improving upon its predecessors by meticulously fine-tuning a series of components to boost performance metrics. The development aligns with practical needs, emphasizing speed and accuracy, and sets a precedent for subsequent object detection endeavors. The research underscores a commitment to refining core technologies, fostering advancements that resonate within both academic and industrial settings.

PDF Markdown Bookmark Chat (Pro)

Authors (13)

Xin Huang (222 papers)
Xinxin Wang (24 papers)
Wenyu Lv (8 papers)
Xiaying Bai (2 papers)
Xiang Long (29 papers)
Kaipeng Deng (4 papers)
Qingqing Dang (15 papers)
Shumin Han (18 papers)
Qiwen Liu (7 papers)
Xiaoguang Hu (18 papers)
Dianhai Yu (37 papers)
Yanjun Ma (29 papers)
Osamu Yoshie (25 papers)

Citations (101)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - PaddlePaddle/PaddleDetection: Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection. (12,784 stars)