Feature-Fused SSD: Fast Detection for Small Objects (1709.05054v3)

Published 15 Sep 2017 in cs.CV

Abstract: Small objects detection is a challenging task in computer vision due to its limited resolution and information. In order to solve this problem, the majority of existing methods sacrifice speed for improvement in accuracy. In this paper, we aim to detect small objects at a fast speed, using the best object detector Single Shot Multibox Detector (SSD) with respect to accuracy-vs-speed trade-off as base architecture. We propose a multi-level feature fusion method for introducing contextual information in SSD, in order to improve the accuracy for small objects. In detailed fusion operation, we design two feature fusion modules, concatenation module and element-sum module, different in the way of adding contextual information. Experimental results show that these two fusion modules obtain higher mAP on PASCALVOC2007 than baseline SSD by 1.6 and 1.7 points respectively, especially with 2-3 points improvement on some smallobjects categories. The testing speed of them is 43 and 40 FPS respectively, superior to the state of the art Deconvolutional single shot detector (DSSD) by 29.4 and 26.4 FPS. Code is available at https://github.com/wnzhyee/Feature-Fused-SSD. Keywords: small object detection, feature fusion, real-time, single shot multi-box detector

Authors (6)

Guimei Cao (3 papers)
Xuemei Xie (17 papers)
Wenzhe Yang (17 papers)
Quan Liao (1 paper)
Guangming Shi (87 papers)
Jinjian Wu (18 papers)

Citations (230)

View on Semantic Scholar

Summary

An Analysis of Feature-Fused SSD: Fast Detection for Small Objects

The paper "Feature-Fused SSD: Fast Detection for Small Objects" introduces a method to enhance the detection accuracy of small objects in real-time object detection systems, without sacrificing speed, by employing a feature fusion technique within the Single Shot Multibox Detector (SSD) framework. This approach is articulated through the development of two novel feature fusion modules, namely the concatenation module and the element-sum module.

Core Contributions and Methodology

The authors pivot on the well-regarded SSD due to its effective balance between speed and accuracy. SSD leverages a pyramid-like hierarchy of features across multiple layers to predict objects of various sizes; however, the semantic deficiency in shallower layers poses challenges for detecting small objects. To address this, the paper enhances SSD by introducing a multi-level feature fusion methodology that incorporates contextual information to improve small object detection.

Feature Fusion Modules:

Concatenation Module: This method utilizes a 1x1 convolutional layer which learns the optimal weighting for combining target and contextual information. Such a design aims to mitigate the influence of background noise, a common issue when embedding context.
Element-Sum Module: This module fuses features by summing them element-wise with manually set weights, theoretically amplifying beneficial contextual features.

Performance Analysis

The experimental evaluation was conducted on the PASCAL VOC2007 and VOC2012 datasets. The results are substantial: the feature-fused SSD models improve the mean Average Precision (mAP) by 1.6 and 1.7 points over the baseline SSD, particularly demonstrating notable effectiveness—2 to 3-point gains—in specific small object categories like bird, boat, and pot plant.

Regarding speed, the novel models achieve 40 to 43 frames per second (FPS), significantly faster than the Deconvolutional Single Shot Detector (DSSD), which operates at 13.6 FPS. This speed advantage is a crucial aspect for real-time applications, confirming the proposed model's practical viability.

Implications and Future Directions

The implications of this work are multidimensional, offering practicality for real-time vision systems requiring both high accuracy and speed, such as autonomous vehicles or surveillance systems. The nuanced feature fusion approach exemplifies a promising direction for enhancing existing architectures like SSD to tackle the perennial challenge of small object detection.

Looking forward, further research could delve into refining the fusion techniques to selectively enhance contextual aggregation, structuring it to adaptively choose optimal context-aware features while intelligently minimizing noise. Additionally, extending this methodology to more complex backgrounds or integrating it with other modern detection frameworks could offer substantial advancements in AI's vision capacities.

In summary, this paper delivers a nuanced and valuable contribution to object detection literature, particularly in improving the accuracy-speed trade-off for small object detection scenarios, employing a methodical blend of feature fusion that could inspire subsequent explorations and enhancements in this domain.

PDF Markdown

Related Papers

Find Related Papers