A Survey of Modern Deep Learning based Object Detection Models (2104.11892v2)

Published 24 Apr 2021 in cs.CV, cs.LG, and eess.IV

Abstract: Object Detection is the task of classification and localization of objects in an image or video. It has gained prominence in recent years due to its widespread applications. This article surveys recent developments in deep learning based object detectors. Concise overview of benchmark datasets and evaluation metrics used in detection is also provided along with some of the prominent backbone architectures used in recognition tasks. It also covers contemporary lightweight classification models used on edge devices. Lastly, we compare the performances of these architectures on multiple metrics.

Authors (6)

Syed Sahil Abbas Zaidi (1 paper)
Mohammad Samar Ansari (5 papers)
Asra Aslam (6 papers)
Nadia Kanwal (6 papers)
Mamoona Asghar (1 paper)
Brian Lee (32 papers)

Citations (643)

View on Semantic Scholar

Summary

A Survey of Modern Deep Learning-Based Object Detection Models

This paper presents a detailed examination of recent advancements in deep learning-based object detection, outlining significant architectures, datasets, evaluation metrics, and emerging trends in the domain. The authors offer a rigorous assessment of both single-stage and two-stage detectors, as well as lightweight models designed for deployment on edge devices.

Overview of Object Detection Models

The paper categorizes object detectors into two groups: two-stage and single-stage models. Two-stage detectors, such as R-CNN and its variants, involve a preliminary region proposal process followed by classification and localization. These models are known for their accuracy but often fall short in real-time applications due to their complex pipeline and slower inference times.

Conversely, single-stage detectors like YOLO and SSD provide a more direct approach by predicting class probabilities and bounding boxes in a single pass, enhancing speed and simplicity. They are particularly advantageous in scenarios requiring real-time detection. YOLOv4 and EfficientDet, for example, offer notable improvements in efficiency and accuracy, challenging the performance gap traditionally seen between the two Categories.

Backbone Architectures

The paper discusses several backbone architectures crucial for feature extraction in detection models:

Convolutional Neural Networks (CNNs): The resurgence of CNNs with models like AlexNet and VGG has dramatically influenced object detection by providing rich feature representations.
Residual Networks (ResNets): The introduction of residual learning addressed the degradation problem in deep networks, and ResNet variants remain prevalent in modern object detectors.
Efficient Networks: Architectures such as EfficientNet leverage neural architecture search and scaling techniques to balance accuracy and efficiency, showcasing the trend toward resource-optimal models.

Lightweight Networks

With the increasing demand for deploying models on mobile and edge devices, lightweight networks are emphasized as a crucial trend. The paper covers various models like MobileNetv3 and ShuffleNet, which employ techniques like depthwise separable convolutions and innovative design principles to achieve significant reductions in computational cost without compromising accuracy.

Datasets and Evaluation Metrics

The survey details prominent datasets such as PASCAL VOC, MS COCO, ILSVRC, and Open Images, each contributing critical benchmarks for model evaluation. The varied challenges posed by these datasets, such as class imbalance and real-world applicability, are addressed with evolving metrics like mean Average Precision (mAP) and Intersection over Union (IoU).

Emerging Trends and Future Directions

The paper identifies several challenges and future directions for the field:

AutoML and NAS: Automated machine learning and neural architecture search present opportunities for optimizing model design with minimal manual intervention.
Weakly Supervised Learning: Reducing dependency on extensive labeled data can significantly lower the costs and efforts involved in training robust detection models.
Real-Time and 3D Detection: Enhancing real-time performance and exploring 3D object detection remain critical for applications like autonomous driving and video analytics.
Domain Adaptation: Facilitating domain transfer across different detection tasks can promote the applicability of models in diverse environments.

Conclusion and Implications

This comprehensive survey underscores the rapid evolution and diversification of object detection models. It highlights the balance between achieving state-of-the-art accuracy and ensuring deployment feasibility on constrained devices. As deep learning paradigms continue to mature, the focus will likely remain on improving efficiency, scalability, and adaptability, paving the way for more significant advances in both theoretical and practical aspects of computer vision. The paper serves as a valuable resource for researchers aiming to stay informed of recent developments and challenges in the field of object detection.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos