A Survey of Deep Learning-based Object Detection (1907.09408v2)

Published 11 Jul 2019 in cs.CV

Abstract: Object detection is one of the most important and challenging branches of computer vision, which has been widely applied in peoples life, such as monitoring security, autonomous driving and so on, with the purpose of locating instances of semantic objects of a certain class. With the rapid development of deep learning networks for detection tasks, the performance of object detectors has been greatly improved. In order to understand the main development status of object detection pipeline, thoroughly and deeply, in this survey, we first analyze the methods of existing typical detection models and describe the benchmark datasets. Afterwards and primarily, we provide a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors. Moreover, we list the traditional and new applications. Some representative branches of object detection are analyzed as well. Finally, we discuss the architecture of exploiting these object detection methods to build an effective and efficient system and point out a set of development trends to better follow the state-of-the-art algorithms and further research.

Authors (7)

Licheng Jiao (109 papers)
Fan Zhang (686 papers)
Fang Liu (801 papers)
Shuyuan Yang (36 papers)
Lingling Li (34 papers)
Zhixi Feng (7 papers)
Rong Qu (12 papers)

Citations (905)

View on Semantic Scholar

Summary

The paper provides a systematic comparison of one-stage and two-stage detection methods, highlighting key breakthroughs in speed and accuracy.
It reviews diverse backbone networks for feature extraction, from robust models like ResNet to lightweight options like MobileNet.
The study identifies practical applications in autonomous driving, surveillance, and healthcare, and outlines future directions for efficiency and robustness.

A Survey of Deep Learning-based Object Detection: An Overview

Object detection is a crucial and intricate domain within computer vision, with extensive practical applications such as security surveillance, autonomous driving, transportation monitoring, and robotic vision. The paper "A Survey of Deep Learning-based Object Detection" by Licheng Jiao et al. offers a detailed analysis of the progress and methodologies employed in deep learning-based object detection, highlighting the significant advancements and the variety of applications, as well as summarizing the principal trends and challenges in the field.

Major Contributions and Methodologies

Deep learning has revolutionized object detection by leveraging powerful neural network architectures and substantial computing resources. The paper systematically categorizes object detection methods into two primary approaches: one-stage and two-stage detectors.

Two-stage Detectors involve a two-step process where the first stage generates region proposals, and the second stage classifies these proposals and refines their boundaries. Representative models include:

R-CNN: Introduced the use of region proposals and CNNs for feature extraction, achieving significant improvements in detection performance.
Fast R-CNN: Enhanced R-CNN by sharing computations and using an RoI pooling layer for faster processing.
Faster R-CNN: Integrated the proposal generation into the network through Region Proposal Networks (RPNs), further improving speed and accuracy.
Mask R-CNN: Extended Faster R-CNN to support instance segmentation, improving detection precision and handling occluded objects better.

One-stage Detectors streamline the process by directly predicting object classes and bounding boxes from the input image without a separate proposal generation stage. Examples include:

YOLO (You Only Look Once): Pioneered a real-time detection system by framing object detection as a single regression problem.
SSD (Single Shot MultiBox Detector): Utilized multi-scale features and predefined anchor boxes to handle objects of various sizes within a single forward pass.
RetinaNet: Addressed the imbalance between foreground and background classes through the focal loss, enhancing detection performance of one-stage models.

Backbone Networks and Feature Extraction

The effectiveness of object detection models largely relies on the backbone networks responsible for feature extraction. The paper reviews various backbone architectures, including:

VGG, ResNet, and ResNeXt: Provide dense and deep feature extraction capabilities, suitable for capturing complex object representations.
Lightweight Networks (MobileNet, ShuffleNet): Offer efficient computation for real-time applications on resource-constrained devices.
Feature Pyramid Networks (FPN): Enhance multi-scale feature representations, crucial for detecting objects of varied sizes.

Applications and Future Directions

The implications of object detection research extend to numerous practical and theoretical advancements in AI:

Autonomous Driving: Real-time detection systems are critical for safe navigation and obstacle avoidance.
Surveillance: Enhanced detection models improve monitoring and threat assessment capabilities.
Healthcare: Medical image analysis benefits from precise detection of anomalies and disease markers.
Remote Sensing: Detection of geological and man-made structures aids in environmental monitoring and urban planning.

The paper speculates on several future developments in AI, emphasizing the need for:

Improved Efficiency: Balancing detection speed and accuracy through optimized network designs and training procedures.
Robustness and Generalization: Enhancing detectors to perform reliably across varied and unseen domains.
Multi-task Learning: Integrating detection with other vision tasks (e.g., segmentation, tracking) to leverage shared representations and improve overall system performance.

Conclusion

This survey underscores the remarkable advancements in deep learning-based object detection facilitated by sophisticated network architectures and innovative methodologies. Looking forward, researchers are poised to address existing challenges, pushing the boundaries of what is achievable in real-world applications. By focusing on efficiency, robustness, and multi-domain adaptability, future models will likely extend the reach and impact of object detection technologies.

By categorizing the techniques and implications described in the paper, this overview provides fellow researchers with a comprehensive understanding of current trends and future directions in deep learning-based object detection.

PDF Markdown