Deep convolutional neural networks for pedestrian detection (1510.03608v5)

Published 13 Oct 2015 in cs.CV

Abstract: Pedestrian detection is a popular research topic due to its paramount importance for a number of applications, especially in the fields of automotive, surveillance and robotics. Despite the significant improvements, pedestrian detection is still an open challenge that calls for more and more accurate algorithms. In the last few years, deep learning and in particular convolutional neural networks emerged as the state of the art in terms of accuracy for a number of computer vision tasks such as image classification, object detection and segmentation, often outperforming the previous gold standards by a large margin. In this paper, we propose a pedestrian detection system based on deep learning, adapting a general-purpose convolutional network to the task at hand. By thoroughly analyzing and optimizing each step of the detection pipeline we propose an architecture that outperforms traditional methods, achieving a task accuracy close to that of state-of-the-art approaches, while requiring a low computational time. Finally, we tested the system on an NVIDIA Jetson TK1, a 192-core platform that is envisioned to be a forerunner computational brain of future self-driving cars.

Authors (6)

Federico Monti (16 papers)
Luca Baroffio (6 papers)
Luca Bondi (15 papers)
Marco Tagliasacchi (37 papers)
Stefano Tubaro (55 papers)
Denis Tomè (2 papers)

Citations (190)

View on Semantic Scholar

Summary

Deep Convolutional Neural Networks for Pedestrian Detection

The paper "Deep Convolutional Neural Networks for Pedestrian Detection," authored by D. Tomè and colleagues, addresses the complex problem of pedestrian detection through the application of deep learning techniques, specifically Convolutional Neural Networks (CNNs). This paper is pertinent due to the critical role pedestrian detection plays in various fields, such as automotive safety systems, surveillance, and robotics.

Background and Context

Pedestrian detection has historically involved a pipeline comprising region proposal, feature extraction, and classification. Traditional methods relied heavily on handcrafted features, such as Histogram of Gradients (HOG) and Integral Channel Features (ICF), which have established benchmarks over the years. Although recent advancements in CNNs have demonstrated improved accuracy in similar visual recognition tasks, their full potential remains untapped in pedestrian detection.

This paper capitalizes on the strengths of CNNs, emphasizing their ability to learn complex features from data and attempting to refine CNN architectures specifically for pedestrian detection. The work builds upon past methods, notably those of Hosang et al., optimizing each stage of the pedestrian detection process to improve detection accuracy while maintaining computational efficiency.

Methodology

The authors explore several candidate region proposals—Sliding Window, Selective Search, and Locally Decorrelated Channel Features (LDCF)—to identify regions likely containing pedestrians. Despite its naivety, the Sliding Window method ensures high recall but at the cost of computational efficiency. LDCF, tailored specifically for pedestrian detection, provides a more balanced approach, discarding negative regions and preserving positive ones.

The paper highlights the use of finetuning procedures on pre-trained models such as AlexNet and GoogLeNet, initially trained on the ImageNet dataset, to adapt them for pedestrian detection. This finetuning is critical given the lack of extensive annotated pedestrian datasets. Moreover, preprocessing methods like padding and negative sample decorrelation are implemented to enhance the robustness of training data.

To further optimize accuracy, the researchers combine region proposal scores in a binary classifier to offer parallel assessments of candidate regions. This technique enhances the detection efficacy of the final system.

Results and Performance Evaluation

The pipeline optimization led to significant improvements in pedestrian detection accuracy. On the standard Caltech Pedestrian Dataset, the proposed system achieved a miss rate of 0.199 at 0.1 FPPI, outperforming several traditional and contemporary approaches. In particular, the results surpass those obtained from similar systems using pure handcrafted techniques or general-purpose object detection networks.

In terms of computational performance, the DeepPed framework balances accuracy with efficiency, requiring approximately 530ms per frame on desktop architecture. The authors also present a lightweight version of the algorithm, achieving real-time performance on modern hardware such as the NVIDIA Jetson TK1.

Future Implications

This research contributes to the theoretical and practical understanding of applying deep learning to pedestrian detection. The results could propel advancements in real-world applications, offering robust, efficient detection systems in smart cars and autonomous systems. Future developments may involve further optimization of CNN architectures and complete GPU implementations to enhance both accuracy and computational speed. Additionally, the exploration of newer deep learning models and augmented datasets could drive continued improvements in pedestrian detection technologies.

In summary, this paper demonstrates a comprehensive approach to enhancing pedestrian detection through CNNs, optimizing performance without sacrificing accuracy, and setting a precedent for future research in automatic visual detection systems.