Papers
Topics
Authors
Recent
Search
2000 character limit reached

PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices

Published 1 Nov 2021 in cs.CV | (2111.00902v1)

Abstract: The better accuracy and efficiency trade-off has been a challenging problem in object detection. In this work, we are dedicated to studying key optimizations and neural network architecture choices for object detection to improve accuracy and efficiency. We investigate the applicability of the anchor-free strategy on lightweight object detection models. We enhance the backbone structure and design the lightweight structure of the neck, which improves the feature extraction ability of the network. We improve label assignment strategy and loss function to make training more stable and efficient. Through these optimizations, we create a new family of real-time object detectors, named PP-PicoDet, which achieves superior performance on object detection for mobile devices. Our models achieve better trade-offs between accuracy and latency compared to other popular models. PicoDet-S with only 0.99M parameters achieves 30.6% mAP, which is an absolute 4.8% improvement in mAP while reducing mobile CPU inference latency by 55% compared to YOLOX-Nano, and is an absolute 7.1% improvement in mAP compared to NanoDet. It reaches 123 FPS (150 FPS using Paddle Lite) on mobile ARM CPU when the input size is 320. PicoDet-L with only 3.3M parameters achieves 40.9% mAP, which is an absolute 3.7% improvement in mAP and 44% faster than YOLOv5s. As shown in Figure 1, our models far outperform the state-of-the-art results for lightweight object detection. Code and pre-trained models are available at https://github.com/PaddlePaddle/PaddleDetection.

Citations (103)

Summary

  • The paper presents a novel lightweight, anchor-free architecture that enhances real-time object detection on mobile devices.
  • It details architectural enhancements including the CSP-PAN neck and Enhanced ShuffleNet backbone to optimize feature extraction and reduce computational cost.
  • Advanced training strategies like SimOTA dynamic label assignment and a one-shot NAS pipeline yield significant improvements in mAP and inference latency.

Overview of PP-PicoDet: Enhancing Real-Time Object Detection on Mobile Devices

The paper presents the development of a new family of lightweight, anchor-free object detectors named PP-PicoDet, optimized specifically for mobile devices. Efficiency and accuracy are central concerns in object detection due to the constraints of mobile hardware. This research focuses on achieving a balance between these factors by refining neural architectures and leveraging novel strategies in object detection.

PP-PicoDet introduces several architectural enhancements and training methods that significantly outperform existing lightweight models in both accuracy and latency. Among various configurations, PicoDet-S and PicoDet-L exhibit substantial improvements over competitors such as YOLOX-Nano, YOLOv5s, and NanoDet, illustrating the potential of PP-PicoDet in mobile environments.

Key Contributions

  1. Architecture Enhancements:
    • The paper adapts the CSP structure to create the CSP-PAN neck, optimizing feature extraction and reducing parameters while enhancing receptive field through depthwise separable convolutions.
    • A novel backbone, Enhanced ShuffleNet (ESNet), is proposed, building on ShuffleNetV2 for better performance in mobile contexts.
  2. Training Strategies:
    • The SimOTA dynamic label assignment strategy is utilized, refined with a modified cost matrix that employs a weighted combination of Varifocal Loss and GIoU Loss, which improves both accuracy and training stability.
    • A one-shot NAS pipeline is developed that automates the search for optimal architectures directly on detection datasets, focusing on channel-wise optimization to achieve efficient architecture discovery.
  3. Performance Evaluation:
    • PicoDet-S achieves a mean Average Precision (mAP) of 30.6% while significantly reducing CPU inference latency by 55% compared to YOLOX-Nano.
    • PicoDet-L garners an mAP of 40.9%, with a 44% decrease in latency compared to YOLOv5s, demonstrating the notable edge of PP-PicoDet.

Implications and Future Directions

The research presents compelling evidence for the efficacy of optimized neural architectures and refined training methodologies in enhancing real-time object detection on mobile devices. Practically, these innovations facilitate high-performance applications in domains such as autonomous driving and intelligent transportation, where rapid, accurate detection is crucial.

Theoretically, PP-PicoDet highlights the potential of NAS and anchor-free methodologies in mobile settings, paving the way for further research into lightweight and efficient model designs. Future developments might explore more advanced NAS techniques or synergy between anchor-free strategies and emerging computer vision paradigms.

In conclusion, PP-PicoDet offers a significant advancement in mobile object detection, presenting a comprehensive framework that marries architectural efficiency with predictive accuracy. The exploration of novel strategies and their application to resource-constrained environments holds promise for future AI systems operating on mobile platforms.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.