SlimYOLOv3: Narrower, Faster and Better for Real-Time UAV Applications (1907.11093v1)

Published 25 Jul 2019 in cs.CV

Abstract: Drones or general Unmanned Aerial Vehicles (UAVs), endowed with computer vision function by on-board cameras and embedded systems, have become popular in a wide range of applications. However, real-time scene parsing through object detection running on a UAV platform is very challenging, due to limited memory and computing power of embedded devices. To deal with these challenges, in this paper we propose to learn efficient deep object detectors through channel pruning of convolutional layers. To this end, we enforce channel-level sparsity of convolutional layers by imposing L1 regularization on channel scaling factors and prune less informative feature channels to obtain "slim" object detectors. Based on such approach, we present SlimYOLOv3 with fewer trainable parameters and floating point operations (FLOPs) in comparison of original YOLOv3 (Joseph Redmon et al., 2018) as a promising solution for real-time object detection on UAVs. We evaluate SlimYOLOv3 on VisDrone2018-Det benchmark dataset; compelling results are achieved by SlimYOLOv3 in comparison of unpruned counterpart, including ~90.8% decrease of FLOPs, ~92.0% decline of parameter size, running ~2 times faster and comparable detection accuracy as YOLOv3. Experimental results with different pruning ratios consistently verify that proposed SlimYOLOv3 with narrower structure are more efficient, faster and better than YOLOv3, and thus are more suitable for real-time object detection on UAVs. Our codes are made publicly available at https://github.com/PengyiZhang/SlimYOLOv3.

PDF Abstract

SlimYOLOv3: Advancements in Real-Time Object Detection for UAVs

The demand for real-time object detection in Unmanned Aerial Vehicles (UAVs) necessitates efficient models capable of operating under the constraints of limited computing resources and memory. The paper "SlimYOLOv3: Narrower, Faster and Better for Real-Time UAV Applications" by Zhang, Zhong, and Li addresses these challenges through the innovative application of channel pruning on convolutional layers within deep learning models, presenting SlimYOLOv3 as a viable solution.

Channel Pruning and Efficiency

Channel pruning is a model compression technique that reduces the size of a network by removing less significant channels from convolutional layers. This paper proposes enhancing the efficiency of YOLOv3 by integrating channel-level sparsity training followed by channel pruning. L1 regularization is applied to channel scaling factors to facilitate effective pruning, thus producing a "slim" model: SlimYOLOv3.

The practical implementation of this pruning strategy results in significantly reduced trainable parameters and floating point operations (FLOPs). Experimental evaluations exhibit a substantial ~90.8% reduction in FLOPs and a ~92.0% decrease in parameter size, while maintaining comparable detection accuracy to the original YOLOv3. SlimYOLOv3 demonstrates an impressive doubling of inference speed, highlighting the benefits of this approach for real-time tasks on UAV platforms.

Experimental Results

Utilizing the VisDrone2018-Det benchmark dataset, the paper presents empirical evidence that SlimYOLOv3 offers a balanced trade-off between complexity and performance. Notably, with an input size of 832×832, SlimYOLOv3-SPP3 achieves a mean Average Precision (mAP) that rivals its unpruned counterpart but requires only the computational resources akin to YOLOv3-tiny. These results underscore the effectiveness of incorporating spatial pyramid pooling (SPP) modules, which enhance feature extraction through multiscale receptive fields.

Furthermore, different pruning ratios tested in the experiments validate the consistent improvement in efficiency of SlimYOLOv3 over YOLOv3, reaffirming its suitability for real-time applications in challenging environments.

Implications and Future Directions

The research suggests that the application of channel pruning not only addresses the constraints of UAV deployment but also serves as a step towards optimizing neural network architectures for resource-constrained scenarios. One implication is the potential reduction in power consumption, a critical factor for extended UAV operations.

However, the authors note limitations in handling category imbalance within datasets, suggesting an area for future work to enhance detection accuracy across varying object categories. Potential directions for further development include integrating mechanisms to counteract category imbalance and extending pruning techniques to other neural network architectures tailored for UAVs.

Overall, SlimYOLOv3 represents a significant advancement in the deployment of deep learning models for UAV applications, offering a framework for balancing the demands of accuracy and computational efficiency. As the field progresses, such innovations can pave the way for more adaptable and robust object detection systems within UAVs and similar platforms.