Recent Advances in Deep Learning for Object Detection (1908.03673v1)

Published 10 Aug 2019 in cs.CV, cs.LG, and cs.MM

Abstract: Object detection is a fundamental visual recognition problem in computer vision and has been widely studied in the past decades. Visual object detection aims to find objects of certain target classes with precise localization in a given image and assign each object instance a corresponding class label. Due to the tremendous successes of deep learning based image classification, object detection techniques using deep learning have been actively studied in recent years. In this paper, we give a comprehensive survey of recent advances in visual object detection with deep learning. By reviewing a large body of recent related work in literature, we systematically analyze the existing object detection frameworks and organize the survey into three major parts: (i) detection components, (ii) learning strategies, and (iii) applications & benchmarks. In the survey, we cover a variety of factors affecting the detection performance in detail, such as detector architectures, feature learning, proposal generation, sampling strategies, etc. Finally, we discuss several future directions to facilitate and spur future research for visual object detection with deep learning. Keywords: Object Detection, Deep Learning, Deep Convolutional Neural Networks

Authors (3)

Xiongwei Wu (16 papers)
Doyen Sahoo (47 papers)
Steven C. H. Hoi (94 papers)

Citations (721)

View on Semantic Scholar

Summary

Advances in Deep Learning for Object Detection

In recent years, object detection has emerged as a pivotal challenge in computer vision, primarily due to the advent of deep learning techniques. The surveyed paper provides a thorough examination of the latest advancements in this field, focusing on the integration of deep learning technologies with object detection frameworks. It organizes these contributions into three main components: detection frameworks, learning strategies, and applications.

Detection Frameworks

Object detection frameworks have been classified into two primary categories: two-stage detectors and one-stage detectors. Two-stage detectors such as R-CNN and its variants (Fast R-CNN, Faster R-CNN) first generate object proposals and then perform classification and bounding box regression. They have set the benchmark for detection accuracy. One-stage detectors like YOLO and SSD offer a more streamlined approach, directly predicting bounding box coordinates and class probabilities from dense grid cells across the image, prioritizing speed over accuracy.

Learning Strategies

Training effective object detectors involves tackling significant challenges, particularly concerning class imbalance and localization accuracy. Various strategies have been employed to address imbalance through techniques like hard negative mining and focal loss. Localization refinement is often augmented with multiple regression stages to enhance bounding box precision. Additionally, substantial focus has been placed on data augmentation and leveraging training strategies such as adversarial learning and knowledge distillation.

Application and Benchmarks

Application-driven research in object detection has intensified, with specialized adaptations for tasks such as face detection and pedestrian detection. For instance, face detection involves challenges related to occlusion and varying scales, requiring models that can handle extreme intra-class variance. Pedestrian detection confronts similar issues in crowded scenarios, thus demanding robust feature representations that emphasize scale and contextual information.

Benchmarks such as Pascal VOC and MS COCO remain fundamental in evaluating advancements in detection accuracy and speed. The paper provides comprehensive assessments of these benchmarks, detailing the progress achieved by various models over the years.

Implications and Future Directions

The implications of these advancements are broad, with improvements in detection frameworks likely impacting a variety of fields including autonomous driving and surveillance systems. The move towards anchor-free detection and the exploration of AutoML for automatic architecture design are particularly promising directions for future innovation. Challenges such as low-shot detection and the need for scalable, efficient models remain open areas for research.

Overall, the paper highlights deep learning's transformative impact on object detection, underscoring the importance of continuous exploration and development in optimizing detection algorithms for real-world applications.

PDF Markdown