Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Single-Shot Refinement Neural Network for Object Detection (1711.06897v3)

Published 18 Nov 2017 in cs.CV

Abstract: For object detection, the two-stage approach (e.g., Faster R-CNN) has been achieving the highest accuracy, whereas the one-stage approach (e.g., SSD) has the advantage of high efficiency. To inherit the merits of both while overcoming their disadvantages, in this paper, we propose a novel single-shot based detector, called RefineDet, that achieves better accuracy than two-stage methods and maintains comparable efficiency of one-stage methods. RefineDet consists of two inter-connected modules, namely, the anchor refinement module and the object detection module. Specifically, the former aims to (1) filter out negative anchors to reduce search space for the classifier, and (2) coarsely adjust the locations and sizes of anchors to provide better initialization for the subsequent regressor. The latter module takes the refined anchors as the input from the former to further improve the regression and predict multi-class label. Meanwhile, we design a transfer connection block to transfer the features in the anchor refinement module to predict locations, sizes and class labels of objects in the object detection module. The multi-task loss function enables us to train the whole network in an end-to-end way. Extensive experiments on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO demonstrate that RefineDet achieves state-of-the-art detection accuracy with high efficiency. Code is available at https://github.com/sfzhang15/RefineDet

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Shifeng Zhang (46 papers)
  2. Longyin Wen (45 papers)
  3. Xiao Bian (12 papers)
  4. Zhen Lei (205 papers)
  5. Stan Z. Li (222 papers)
Citations (1,247)

Summary

  • The paper introduces a two-step cascaded regression approach that enhances one-stage detector accuracy while maintaining speed.
  • The paper details a dual-module architecture with an Anchor Refinement Module filtering negatives and an Object Detection Module optimizing predictions.
  • Experimental results on PASCAL VOC and MS COCO reveal significant mAP improvements, highlighting RefineDet's potential for real-time applications.

Single-Shot Refinement Neural Network for Object Detection

The proposed "Single-Shot Refinement Neural Network for Object Detection," introduces a novel framework named RefineDet that aims to blend the distinct advantages of two traditional object detection paradigms: the two-stage approach and the one-stage approach. While the two-stage approaches like Faster R-CNN provide high accuracy, they often fall short in terms of computational efficiency. On the other hand, one-stage approaches like SSD (Single Shot MultiBox Detector) manage to deliver high efficiency but usually at the expense of detection accuracy. The RefineDet framework seeks to enhance the accuracy of one-stage methods while maintaining their efficiency.

Network Architecture

RefineDet employs a two-module architecture: the Anchor Refinement Module (ARM) and the Object Detection Module (ODM).

  1. Anchor Refinement Module (ARM)
    • Functionality: ARM focuses on filtering out negative anchors to reduce the classifier’s search space and adjusts the locations and sizes of anchors, thereby providing better initialization for subsequent regression.
    • Components: This module is constructed by modifying layers in base networks like VGG-16 and ResNet-101, pretrained on ImageNet.
  2. Object Detection Module (ODM)
    • Functionality: ODM receives the refined anchors from ARM and further refines the regression and predicts multi-class labels.
    • Components: The ODM consists of Transfer Connection Blocks (TCBs) which help transfer features from ARM to ODM, preserving information crucial for accurate detection.

RefineDet leverages a novel multi-task loss function, enabling end-to-end network training. The two-step cascaded regression framework proposed in this paper helps to fine-tune object locations and sizes, significantly improving accuracy, particularly for smaller objects.

Key Contributions

  1. Two-Step Cascaded Regression: By introducing a two-step cascaded regression mechanism, RefineDet achieves more precise localization and better bounding box predictions.
  2. Negative Anchor Filtering: Early rejection of well-classified negative anchors helps in addressing the class imbalance problem, a common issue in one-stage object detectors.
  3. Transfer Connection Blocks (TCBs): TCBs facilitate the transfer of features from ARM to ODM, thus exploiting features more efficiently.

Experimental Results

The experimental validation of RefineDet was conducted on three primary datasets: PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO.

  • PASCAL VOC 2007: RefineDet320 achieved an mAP of 80.0%, outperforming several contemporary models such as SSD and YOLO. RefineDet512 achieved an mAP of 81.8%.
  • PASCAL VOC 2012: RefineDet demonstrated remarkable performance with an mAP of 80.1% using a 512x512 input size.
  • MS COCO: RefineDet512, with ResNet-101 as the backbone, achieved an mAP of 36.4%, a significant improvement over previous state-of-the-art one-stage detectors.

Additionally, when models pretrained on MS COCO were fine-tuned on PASCAL VOC, RefineDet320 and RefineDet512 yielded leading mAP scores of 85.8% on PASCAL VOC 2007 and 86.8% on PASCAL VOC 2012, respectively. This cross-dataset performance emphasizes RefineDet's robustness.

Implications and Future Directions

RefineDet not only bridges the accuracy-efficiency gap between one-stage and two-stage detectors but also sets a new standard for lightweight, high-accuracy models suitable for real-time applications. The impressive performance on various datasets suggests its potential for broader applications, including autonomous driving, surveillance, and real-time analytics in resource-constrained environments.

Future advancements could include incorporating attention mechanisms to further enhance feature discrimination and accuracy. Additionally, addressing small object detection remains an area of interest; mechanisms such as feature pyramid networks (FPN) or even more sophisticated multi-scale feature integration techniques may provide further improvements.

By striking a fine balance between accuracy and computational efficiency, and providing new mechanisms for feature transfer and cascaded regression, RefineDet propels the field of object detection forward, laying the groundwork for more optimized and effective models.

Github Logo Streamline Icon: https://streamlinehq.com