FSSD: Feature Fusion Single Shot Multibox Detector (1712.00960v4)

Published 4 Dec 2017 in cs.CV

Abstract: SSD (Single Shot Multibox Detector) is one of the best object detection algorithms with both high accuracy and fast speed. However, SSD's feature pyramid detection method makes it hard to fuse the features from different scales. In this paper, we proposed FSSD (Feature Fusion Single Shot Multibox Detector), an enhanced SSD with a novel and lightweight feature fusion module which can improve the performance significantly over SSD with just a little speed drop. In the feature fusion module, features from different layers with different scales are concatenated together, followed by some down-sampling blocks to generate new feature pyramid, which will be fed to multibox detectors to predict the final detection results. On the Pascal VOC 2007 test, our network can achieve 82.7 mAP (mean average precision) at the speed of 65.8 FPS (frame per second) with the input size 300$\times$300 using a single Nvidia 1080Ti GPU. In addition, our result on COCO is also better than the conventional SSD with a large margin. Our FSSD outperforms a lot of state-of-the-art object detection algorithms in both aspects of accuracy and speed. Code is available at https://github.com/lzx1413/CAFFE_SSD/tree/fssd.

References (33)

Citations (483)

View on Semantic Scholar

Summary

The paper demonstrates a novel feature fusion module that integrates multi-scale convolution features to improve object detection.
It achieves 82.7 mAP and 65.8 FPS on Pascal VOC, outperforming traditional SSD and other state-of-the-art detectors on small objects.
The study highlights the potential for real-time detection enhancements and future integrations with advanced backbone networks.

Feature Fusion Single Shot Multibox Detector

The paper "Feature Fusion Single Shot Multibox Detector" (FSSD) introduces an enhanced object detection framework based on the widely recognized Single Shot Multibox Detector (SSD). With a focus on addressing challenges related to scale variations in object detection, FSSD integrates a novel feature fusion module that significantly improves upon the original SSD's performance metrics, showing advancements in both accuracy and speed.

Methodology Overview

The core contribution of the paper lies in the implementation of a feature fusion module that amalgamates multi-scale feature maps derived from different convolutional layers within the network. This module ensures a comprehensive utilization of features by concatenating them, which is further refined using down-sampling blocks to generate new feature pyramids feeding into multibox detectors. This contrasts with the traditional SSD, which processes features from different layers independently, leading to inefficiencies and inaccuracies, especially in small object detection.

Experimental Results

The paper presents compelling numerical results, demonstrating FSSD's superiority over SSD and other state-of-the-art object detectors. On the Pascal VOC 2007 dataset, FSSD achieves a mean average precision (mAP) of 82.7 with a processing speed of 65.8 frames per second (FPS) using an Nvidia 1080Ti GPU. This marks a notable improvement over the conventional SSD, especially in the context of small object detection where semantic information is crucial. Furthermore, the FSSD outperforms other algorithms like DSSD, while retaining efficiency comparable to YOLOv2, offering a balance between speed and accuracy without the computational overhead associated with deeper networks like ResNet-101.

Implications and Future Work

The implications of FSSD are substantial for the field of object detection. By effectively fusing multi-scale features with minimal computational burden, FSSD paves the way for more efficient and accurate real-time detection systems. Its architecture could potentially be adapted for more complex models or integrated into frameworks like Mask RCNN, suggesting a direction for future research. Additionally, exploring backbone networks other than VGG16, such as DenseNet or efficient lightweight models, could further enhance FSSD's applicability in varied contexts, particularly where computational resources are limited.

This paper provides a well-structured approach that can facilitate advancements in deploying robust, real-time object detection systems across various applications, reinforcing the utility of feature fusion in convolutional neural networks.

PDF Markdown

Related Papers

GitHub

GitHub - lzx1413/CAFFE_SSD at fssd