Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images

Published 25 Mar 2023 in cs.CV | (2303.14488v1)

Abstract: Object detection on drone images with low-latency is an important but challenging task on the resource-constrained unmanned aerial vehicle (UAV) platform. This paper investigates optimizing the detection head based on the sparse convolution, which proves effective in balancing the accuracy and efficiency. Nevertheless, it suffers from inadequate integration of contextual information of tiny objects as well as clumsy control of the mask ratio in the presence of foreground with varying scales. To address the issues above, we propose a novel global context-enhanced adaptive sparse convolutional network (CEASC). It first develops a context-enhanced group normalization (CE-GN) layer, by replacing the statistics based on sparsely sampled features with the global contextual ones, and then designs an adaptive multi-layer masking strategy to generate optimal mask ratios at distinct scales for compact foreground coverage, promoting both the accuracy and efficiency. Extensive experimental results on two major benchmarks, i.e. VisDrone and UAVDT, demonstrate that CEASC remarkably reduces the GFLOPs and accelerates the inference procedure when plugging into the typical state-of-the-art detection frameworks (e.g. RetinaNet and GFL V1) with competitive performance. Code is available at https://github.com/Cuogeihong/CEASC.

Abstract PDF Upgrade to Chat

Authors (4)

Citations (40)

View on Semantic Scholar

Summary

The paper introduces Context-Enhanced Adaptive Sparse Convolutional Networks (CEASC) to improve the efficiency and accuracy of object detection on drone images by balancing computational cost and performance.
CEASC features Context-Enhanced Sparse Convolution (CESC) that integrates global context for feature normalization and Adaptive Multi-layer Masking (AMM) for dynamic mask ratio adjustment across scales.
Experiments demonstrate that CEASC significantly reduces computational load (over 70% GFLOPs on GFL V1) while maintaining or improving mAP on VisDrone and UAVDT datasets, enabling efficient deployment on resource-constrained UAVs.

Adaptive Sparse Convolutional Networks for Drone Image Detection

The presented paper proposes an innovative approach in the domain of object detection on drone imagery, addressing the critical balance between accuracy and computational efficiency. The authors introduce the concept of context-enhanced adaptive sparse convolutional networks (CEASC) to overcome the limitations of existing methods that either focus predominantly on accuracy at the expense of computational cost or fail to adequately optimize the detection head for resource-constrained platforms, such as unmanned aerial vehicles (UAVs).

Key Contributions and Methodology

The CEASC framework is characterized by two primary components: Context-Enhanced Sparse Convolution (CESC) and Adaptive Multi-layer Masking (AMM). These components are designed to address the challenges of sparse convolutional networks in drone imagery, particularly issues related to the integration of contextual information and the control of mask ratios across varying scales.

Context-Enhanced Sparse Convolution (CESC): The CESC leverages a novel Group Normalization layer which incorporates global contextual information into the feature normalization process. This approach compensates for the potential loss of context due to the sparse sampling inherent in these networks, thereby stabilizing feature distribution and enhancing detection accuracy. The use of a residual structure further mitigates the issue by combining the sparse feature maps with global context vectors.
Adaptive Multi-layer Masking (AMM): This component dynamically adjusts mask ratios at different levels of the feature pyramid network. It calculates optimal mask ratios that minimize computational resources while maintaining high detection accuracy. By doing so, it provides an adaptability to varying scales encountered in drone images, ensuring efficient yet precise detection.

Experimental Results

Empirical evaluations were conducted on key benchmarks: the VisDrone and UAVDT datasets. The application of the CEASC on standard detection frameworks such as RetinaNet and GFL V1 demonstrated significant reductions in GFLOPs while maintaining competitive mAP scores. For instance, when integrated with GFL V1, CEASC reduced the computational load by over 70% while slightly improving detection performance. Such results signify a substantial improvement in efficiency, crucial for deployment on resource-constrained UAVs.

Implications and Future Work

The implications of this research are twofold. Practically, it offers a viable solution for deploying high-efficiency, low-latency object detection systems on UAV platforms, which are increasingly used in areas such as surveillance, agriculture, and delivery services. Theoretically, it contributes to the discourse on sparse convolution networks by demonstrating how context enhancement and adaptive mechanisms can significantly refine the balance between accuracy and computational resource usage.

Moving forward, potential research avenues include extending the CEASC framework to other domains involving constrained hardware environments, such as mobile devices and autonomous vehicles. Exploration into more sophisticated adaptive techniques, including reinforcement learning or dynamic optimization algorithms, could further improve the trade-off between computational efficiency and detection accuracy.

Overall, the presented work adds a substantial milestone in the development and application of efficient object detection networks, paving the way for broader adoption and integration into real-world UAV applications.

Markdown Report Issue