DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection

Published 20 Mar 2019 in cs.CV | (1903.08589v2)

Abstract: Although the YOLOv2 method is extremely fast on object detection, its detection accuracy is restricted due to the low performance of its backbone network and the underutilization of multi-scale region features. Therefore, a dense connection (DC) and spatial pyramid pooling (SPP) based YOLO (DC-SPP-YOLO) method for ameliorating the object detection accuracy of YOLOv2 is proposed in this paper. Specifically, the dense connection of convolution layers is employed in the backbone network of YOLOv2 to strengthen the feature extraction and alleviate the vanishing-gradient problem. Moreover, an improved spatial pyramid pooling is introduced to pool and concatenate the multi-scale region features, so that the network can learn the object features more comprehensively. The DC-SPP-YOLO model is established and trained based on a new loss function composed of MSE (mean square error) loss and cross-entropy loss. The experimental results indicated that the mAP (mean Average Precision) of DC-SPP-YOLO is higher than that of YOLOv2 on the PASCAL VOC datasets and the UA-DETRAC datasets. The effectiveness of DC-SPP-YOLO method proposed is demonstrated.

Abstract PDF Upgrade to Chat

Authors (6)

Citations (275)

View on Semantic Scholar

Summary

The paper achieves enhanced object detection by integrating dense connections to improve feature propagation and mitigate gradient issues.
It employs an improved spatial pyramid pooling mechanism that fuses multi-scale features for superior accuracy, especially in small object detection.
Rigorous benchmarks show that DC-SPP-YOLO increases mAP on standard datasets while maintaining near real-time performance.

Evaluation of DC-SPP-YOLO: Enhancements to YOLOv2 for Object Detection

The computational methodology introduced in this paper presents a novel object detection framework, DC-SPP-YOLO, which integrates Dense Connection (DC) and Spatial Pyramid Pooling (SPP) into the existing YOLOv2 architecture. The enhancements are designed to address certain limitations of the YOLOv2 framework, including restricted detection accuracy due to the insufficient performance of its backbone and inadequate utilization of multi-scale region features. The objective is to improve the precision of object detection without significantly compromising real-time speed.

Methodological Improvements

The paper details three primary modifications to the YOLOv2 architecture:

Dense Connection Integration: The backbone network utilizes a dense connection structure, intending to mitigate the vanishing-gradient problem that affects deep networks. The dense connectivity achieves maximum information flow and strengthens feature propagation. By integrating the dense connection structure at strategic layers, notably within the deeper layers extracting richer semantic information, the method sees an enhancement in feature extraction capabilities.
Improved Spatial Pyramid Pooling: Unlike the original YOLOv2 that underutilizes local region features, the improved SPP block fuses local and global multi-scale features extracted on a singular convolutional layer. This pooling mechanism enables more effective detection of features across varying scales, contributing to superior accuracy metrics especially for small object detection.
Novel Loss Function: A hybrid loss function, combining Mean Squared Error (MSE) for localization and cross-entropy for classification, is adopted. This addresses the shortcomings in gradient propagation observed with the singular use of MSE in YOLOv2, potentially accelerating convergence during model training and improving robustness.

Numerical Results and Analysis

The DC-SPP-YOLO method was rigorously evaluated on established benchmarks, including PASCAL VOC and UA-DETRAC datasets. The quantitative results indicate a consistent enhancement in mean Average Precision (mAP) across these datasets, compared to baseline YOLOv2 performance. Specifically, on the PASCAL VOC 2007 test dataset, DC-SPP-YOLO achieved an mAP of 78.4% at 56.3 fps and 79.6% at 38.9 fps, which exemplifies modest improvement in accuracy with a nominal decrease in speed.

When juxtaposed with other contemporary of object detection frameworks utilizing deeper networks, such as Faster R-CNN with ResNet-101 and SSD with VGG16, DC-SPP-YOLO maintains competitive edge, particularly in the balanced trade-off between accuracy and computational efficiency.

Theoretical Implications and Future Work

The introduction of dense connections and spatial pyramid pooling to YOLOv2 presents an incremental yet noteworthy enhancement in the convolutional paradigm for real-time object detection, specifically addressing scalability across diverse image scales and improving resilience against vanishing gradients. These improvements can lead to profound implications in deploying computer vision solutions within resource-constrained environments, such as intelligent transportation systems and mobile robotics.

Potential directions for future research may include further exploration of backbone architectures that increase both depth and width, refining spatial pooling techniques, and anticipatory learning mechanisms that could further enhance detection efficiency across complex image annotations. Another avenue is enhancing rotation and scale invariance, addressing challenges in recognizing objects under varying orientation and size conditions.

In conclusion, the DC-SPP-YOLO framework represents a significant advancement over the foundational YOLOv2, promising improved application in areas necessitating high-speed, high-accuracy object detection while maintaining conceptual simplicity and operational integrity.

Markdown Report Issue