Pedestrian Detection aided by Deep Learning Semantic Tasks (1412.0069v1)

Published 29 Nov 2014 in cs.CV

Abstract: Deep learning methods have achieved great success in pedestrian detection, owing to its ability to learn features from raw pixels. However, they mainly capture middle-level representations, such as pose of pedestrian, but confuse positive with hard negative samples, which have large ambiguity, e.g. the shape and appearance of tree trunk' orwire pole' are similar to pedestrian in certain viewpoint. This ambiguity can be distinguished by high-level representation. To this end, this work jointly optimizes pedestrian detection with semantic tasks, including pedestrian attributes (e.g. carrying backpack') and scene attributes (e.g.road', tree', andhorizontal'). Rather than expensively annotating scene attributes, we transfer attributes information from existing scene segmentation datasets to the pedestrian dataset, by proposing a novel deep model to learn high-level features from multiple tasks and multiple data sources. Since distinct tasks have distinct convergence rates and data from different datasets have different distributions, a multi-task objective function is carefully designed to coordinate tasks and reduce discrepancies among datasets. The importance coefficients of tasks and network parameters in this objective function can be iteratively estimated. Extensive evaluations show that the proposed approach outperforms the state-of-the-art on the challenging Caltech and ETH datasets, where it reduces the miss rates of previous deep models by 17 and 5.5 percent, respectively.

Citations (431)

View on Semantic Scholar

Summary

The paper introduces TA-CNN, a multi-task framework that integrates semantic tasks to significantly boost pedestrian detection performance.
The method transfers scene attributes from external datasets to enrich feature learning and effectively reduce false positives.
Quantitative evaluations on Caltech and ETH datasets show miss rate reductions of 17% and 5.5%, respectively, outperforming previous models.

Overview of "Pedestrian Detection Aided by Deep Learning Semantic Tasks"

The paper "Pedestrian Detection aided by Deep Learning Semantic Tasks" addresses the problem of pedestrian detection by incorporating deep learning techniques enhanced with semantic tasks. The authors Yonglong Tian, Ping Luo, Xiaogang Wang, and Xiaoou Tang propose a novel methodology that leverages auxiliary semantic knowledge to improve the discriminative power when separating pedestrians from challenging background patterns.

Methodology

The core of the proposed approach is the Task-Assistant Convolutional Neural Network (TA-CNN), which integrates multiple auxiliary semantic tasks, such as pedestrian and scene attributes, into the detection pipeline. The TA-CNN is designed to overcome limitations found in existing models that predominantly focus on middle-level features, which often result in the misclassification of difficult negative samples, like tree trunks or wire poles that may resemble human shapes.

The proposed methodology involves the transfer of attribute information from various scene segmentation datasets to the pedestrian detection dataset. This transfer enables the learning of high-level differentiative features through a multi-task learning framework. Each task carries an associated importance coefficient, adjusted iteratively alongside network parameters, to balance contributions from each task and dataset.

Results

Quantitative evaluations on challenging datasets, such as Caltech and ETH, demonstrate that the TA-CNN reduces miss rates significantly by 17% and 5.5% respectively, compared to the best-performing deep learning models at the time. This reduction highlights the effectiveness of the multi-task learning strategy in enhancing the robustness of pedestrian detection models.

Key Contributions

Multi-Task Learning: By jointly optimizing pedestrian detection with semantic tasks, the method enriches feature learning to handle substantial pedestrian and background variations.
Attribute Transfer and Learning: Scene attributes are transferred from existing datasets, bypassing the need for extensive manual annotation, while pedestrian attributes are directly labeled on the dataset.
Task Coordination: A novel objective function manages distinct task convergence rates, coordinating learning processes effectively across tasks and datasets.
Improved Detection Performance: Systematic evaluations indicate superior performance over state-of-the-art models by integrating semantic knowledge into feature learning.

Implications and Future Directions

The proposed TA-CNN framework demonstrates that incorporating semantic tasks provides substantial improvements over traditional and advanced models by learning richer, high-level features. This approach sets a precedent for future work in exploring the integration of more complex attribute configurations and potential application expansion toward comprehensive scene understanding tasks, such as scene parsing. Additionally, considering efficient implementation strategies, including further optimization of negative mining processes, may extend real-world applicability for intelligent surveillance systems.

The enduring impact of this research lies in its demonstration of the power of semantic task integration in aiding complex vision tasks, potentially guiding future innovations in multi-task learning paradigms for computer vision applications.

PDF Markdown