Scale-aware Fast R-CNN for Pedestrian Detection (1510.08160v3)

Published 28 Oct 2015 in cs.CV

Abstract: In this work, we consider the problem of pedestrian detection in natural scenes. Intuitively, instances of pedestrians with different spatial scales may exhibit dramatically different features. Thus, large variance in instance scales, which results in undesirable large intra-category variance in features, may severely hurt the performance of modern object instance detection methods. We argue that this issue can be substantially alleviated by the divide-and-conquer philosophy. Taking pedestrian detection as an example, we illustrate how we can leverage this philosophy to develop a Scale-Aware Fast R-CNN (SAF R-CNN) framework. The model introduces multiple built-in sub-networks which detect pedestrians with scales from disjoint ranges. Outputs from all the sub-networks are then adaptively combined to generate the final detection results that are shown to be robust to large variance in instance scales, via a gate function defined over the sizes of object proposals. Extensive evaluations on several challenging pedestrian detection datasets well demonstrate the effectiveness of the proposed SAF R-CNN. Particularly, our method achieves state-of-the-art performance on Caltech, INRIA, and ETH, and obtains competitive results on KITTI.

Authors (6)

Jianan Li (88 papers)
Xiaodan Liang (318 papers)
Tingfa Xu (42 papers)
Jiashi Feng (295 papers)
Shuicheng Yan (275 papers)
Shengmei Shen (11 papers)

Citations (761)

View on Semantic Scholar

Summary

An Analytical Overview of "Scale-aware Fast R-CNN for Pedestrian Detection"

The paper "Scale-aware Fast R-CNN for Pedestrian Detection" by Jianan Li et al. investigates pedestrian detection in natural scenes, tackling the challenges posed by the scale variance of pedestrian instances. This research introduces a method named Scale-Aware Fast R-CNN (SAF R-CNN), proposing a framework that addresses this specific issue by partitioning the detection task according to instance scales.

Core Contributions

The paper contributes significantly to the field via:

Introduction of SAF R-CNN: An adaptation of the Fast R-CNN framework, integrating two sub-networks dedicated to detecting pedestrians at disjoint scale ranges.
Scale-aware Weighting Mechanism: A novel mechanism dynamically combining outputs from the large-size and small-size sub-networks based on the input proposal size, realized through a gate function.
State-of-the-Art Performance: Empirical evidence showing SAF R-CNN's superior performance on several pedestrian detection benchmarks, including Caltech, INRIA, ETH, and KITTI.

Methodology

Problem Definition

The paper highlights the significant variance in spatial scales of pedestrians in images, which complicates the detection process. Traditional methods either augment data to handle scale variance or apply a single multi-scale model; however, these strategies often fail to adequately capture the distinct characteristics of varying scales.

Model Architecture

SAF R-CNN enhances the Fast R-CNN model by incorporating:

Shared Convolutional Layers: Initial convolutional layers extract generic features from the input image.
Scale-Specific Sub-networks: Two sub-networks, tailored for large-size and small-size pedestrians, further process these features.
RoI Pooling: Adaptively pools region proposals into a fixed-size feature map, feeding into fully connected layers.
Scale-aware Fusion: A scale-aware weighting layer combines the outputs of the sub-networks, assigning weights based on proposal size via a gate function. This ensures that the sub-network better suited to the input scale has a higher influence on the final prediction.

Empirical Results

In thorough evaluations across several benchmarks:

Caltech Dataset: SAF R-CNN achieves a log-average miss rate of 9.32%, outperforming existing methods like CompACT-Deep.
INRIA and ETH Datasets: Demonstrated robust generalization capability, with significant lower miss rates on both datasets.
KITTI Dataset: Competitive AP scores, indicating robustness even in complex driving environments.

Analysis of Components

The paper provides an in-depth analysis of the model's components:

Feature Map Size: Larger feature maps improved the detection performance, particularly for small-scale instances.
Shared Convolutional Layers: Optimal shared features were realized by utilizing seven convolutional layers from VGG16, offering a balance between performance and computational efficiency.
Scale-aware Weighting: Adaptive weight assigning through the gate function based on proposal height proved more effective than fixed or hard threshold approaches.

Implications and Future Directions

The SAF R-CNN model illustrates a sophisticated approach to handling scale variance in pedestrian detection, yielding potential applications in autonomous vehicles, surveillance systems, and robotics. The employment of sub-networks specialized at different scales, combined with an adaptive weighting mechanism, opens new avenues for improving detection robustness in diverse environments.

Looking forward, the methodology could be extended beyond pedestrian detection to general object detection tasks, necessitating further research into the generalizability of scale-aware architectures. Moreover, addressing the computational complexity while maintaining high detection accuracy might enhance practical deployment across various platforms.

Conclusion

This paper presents a substantial advancement in pedestrian detection via the innovative Scale-aware Fast R-CNN framework, effectively addressing the prevalent issue of scale variance. The model's robust performance across multiple datasets highlights its potential for widespread application, laying the groundwork for future exploration and development in scale-aware object detection methodologies.

PDF Markdown

Related Papers

Find Related Papers