PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices (2111.00902v1)

Published 1 Nov 2021 in cs.CV

Abstract: The better accuracy and efficiency trade-off has been a challenging problem in object detection. In this work, we are dedicated to studying key optimizations and neural network architecture choices for object detection to improve accuracy and efficiency. We investigate the applicability of the anchor-free strategy on lightweight object detection models. We enhance the backbone structure and design the lightweight structure of the neck, which improves the feature extraction ability of the network. We improve label assignment strategy and loss function to make training more stable and efficient. Through these optimizations, we create a new family of real-time object detectors, named PP-PicoDet, which achieves superior performance on object detection for mobile devices. Our models achieve better trade-offs between accuracy and latency compared to other popular models. PicoDet-S with only 0.99M parameters achieves 30.6% mAP, which is an absolute 4.8% improvement in mAP while reducing mobile CPU inference latency by 55% compared to YOLOX-Nano, and is an absolute 7.1% improvement in mAP compared to NanoDet. It reaches 123 FPS (150 FPS using Paddle Lite) on mobile ARM CPU when the input size is 320. PicoDet-L with only 3.3M parameters achieves 40.9% mAP, which is an absolute 3.7% improvement in mAP and 44% faster than YOLOv5s. As shown in Figure 1, our models far outperform the state-of-the-art results for lightweight object detection. Code and pre-trained models are available at https://github.com/PaddlePaddle/PaddleDetection.

PDF Abstract

Overview of PP-PicoDet: Enhancing Real-Time Object Detection on Mobile Devices

The paper presents the development of a new family of lightweight, anchor-free object detectors named PP-PicoDet, optimized specifically for mobile devices. Efficiency and accuracy are central concerns in object detection due to the constraints of mobile hardware. This research focuses on achieving a balance between these factors by refining neural architectures and leveraging novel strategies in object detection.

PP-PicoDet introduces several architectural enhancements and training methods that significantly outperform existing lightweight models in both accuracy and latency. Among various configurations, PicoDet-S and PicoDet-L exhibit substantial improvements over competitors such as YOLOX-Nano, YOLOv5s, and NanoDet, illustrating the potential of PP-PicoDet in mobile environments.

Key Contributions

Architecture Enhancements:
- The paper adapts the CSP structure to create the CSP-PAN neck, optimizing feature extraction and reducing parameters while enhancing receptive field through depthwise separable convolutions.
- A novel backbone, Enhanced ShuffleNet (ESNet), is proposed, building on ShuffleNetV2 for better performance in mobile contexts.
Training Strategies:
- The SimOTA dynamic label assignment strategy is utilized, refined with a modified cost matrix that employs a weighted combination of Varifocal Loss and GIoU Loss, which improves both accuracy and training stability.
- A one-shot NAS pipeline is developed that automates the search for optimal architectures directly on detection datasets, focusing on channel-wise optimization to achieve efficient architecture discovery.
Performance Evaluation:
- PicoDet-S achieves a mean Average Precision (mAP) of 30.6% while significantly reducing CPU inference latency by 55% compared to YOLOX-Nano.
- PicoDet-L garners an mAP of 40.9%, with a 44% decrease in latency compared to YOLOv5s, demonstrating the notable edge of PP-PicoDet.

Implications and Future Directions

The research presents compelling evidence for the efficacy of optimized neural architectures and refined training methodologies in enhancing real-time object detection on mobile devices. Practically, these innovations facilitate high-performance applications in domains such as autonomous driving and intelligent transportation, where rapid, accurate detection is crucial.

Theoretically, PP-PicoDet highlights the potential of NAS and anchor-free methodologies in mobile settings, paving the way for further research into lightweight and efficient model designs. Future developments might explore more advanced NAS techniques or synergy between anchor-free strategies and emerging computer vision paradigms.

In conclusion, PP-PicoDet offers a significant advancement in mobile object detection, presenting a comprehensive framework that marries architectural efficiency with predictive accuracy. The exploration of novel strategies and their application to resource-constrained environments holds promise for future AI systems operating on mobile platforms.

PDF Markdown Bookmark Chat (Pro)

Authors (15)

Guanghua Yu (4 papers)
Qinyao Chang (4 papers)
Wenyu Lv (8 papers)
Chang Xu (323 papers)
Cheng Cui (15 papers)
Wei Ji (202 papers)
Qingqing Dang (15 papers)
Kaipeng Deng (4 papers)
Guanzhong Wang (34 papers)
Yuning Du (25 papers)
Baohua Lai (11 papers)
Qiwen Liu (7 papers)
Xiaoguang Hu (18 papers)
Dianhai Yu (37 papers)
Yanjun Ma (29 papers)

Citations (103)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - PaddlePaddle/PaddleDetection: Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection. (12,787 stars)