Is Faster R-CNN Doing Well for Pedestrian Detection? (1607.07032v2)

Published 24 Jul 2016 in cs.CV

Abstract: Detecting pedestrian has been arguably addressed as a special topic beyond general object detection. Although recent deep learning object detectors such as Fast/Faster R-CNN [1, 2] have shown excellent performance for general object detection, they have limited success for detecting pedestrian, and previous leading pedestrian detectors were in general hybrid methods combining hand-crafted and deep convolutional features. In this paper, we investigate issues involving Faster R-CNN [2] for pedestrian detection. We discover that the Region Proposal Network (RPN) in Faster R-CNN indeed performs well as a stand-alone pedestrian detector, but surprisingly, the downstream classifier degrades the results. We argue that two reasons account for the unsatisfactory accuracy: (i) insufficient resolution of feature maps for handling small instances, and (ii) lack of any bootstrapping strategy for mining hard negative examples. Driven by these observations, we propose a very simple but effective baseline for pedestrian detection, using an RPN followed by boosted forests on shared, high-resolution convolutional feature maps. We comprehensively evaluate this method on several benchmarks (Caltech, INRIA, ETH, and KITTI), presenting competitive accuracy and good speed. Code will be made publicly available.

Authors (4)

Liliang Zhang (4 papers)
Liang Lin (318 papers)
Xiaodan Liang (318 papers)
Kaiming He (71 papers)

Citations (812)

View on Semantic Scholar

Summary

Is Faster R-CNN Doing Well for Pedestrian Detection?

The paper "Is Faster R-CNN Doing Well for Pedestrian Detection?" by Liliang Zhang, Liang Lin, Xiaodan Liang, and Kaiming He investigates the efficacy of Faster R-CNN for detecting pedestrians, an object detection task with specialized requirements. Despite its success in general object detection, this paper reveals Faster R-CNN's shortcomings in pedestrian detection and proposes improvements by leveraging Region Proposal Network (RPN) and Boosted Forests (BF).

Main Contributions and Findings

The primary contributions of this paper are:

Performance of RPN as a Stand-Alone Detector: The research demonstrates that RPN, typically a component of Faster R-CNN, performs competitively as an autonomous pedestrian detector. With refined anchors that match the average pedestrian aspect ratio and scaled to cover a wider range, the RPN exhibits high recall rates, which are superior to traditional feature-based methods such as SCF and LDCF.
Downstream Classifier Degradation: Contrary to expectations, the paper finds that feeding RPN proposals into the Fast R-CNN classifier degrades performance. The authors attribute this to the insufficient resolution of convolutional feature maps used in Fast R-CNN, which adversely affects the detection of small-sized pedestrian instances.
Enhancements with Boosted Forests: To address the resolution issue and improve handling of hard negative examples, the authors introduce a BF classifier on high-resolution features shared by the RPN. This approach eliminates the necessity for traditional hand-crafted features while improving both accuracy and computational efficiency.

Experimental Results

The paper conducts rigorous evaluations on multiple benchmarks such as Caltech, INRIA, ETH, and KITTI, ensuring comprehensive verification:

Caltech: The method yields an MR (log-average Miss Rate) of 9.6%, substantially outperforming other state-of-the-art methods.
INRIA and ETH: On these datasets, leveraging high-resolution features and effective bootstrapping, the proposed method achieves an MR of 6.9% and 30.2%, respectively, surpassing previous best results.
KITTI: With a mean Average Precision (mAP) of 61.15% on the "Moderate" difficulty level, the method is competitive and maintains practical inference speed at 0.5 seconds per image.

Technical Details and Implementation

Key technical contributions include:

High-Resolution Feature Extraction: By utilizing shallower layers such as Conv3_3 and Conv4_3, the BF classifier extracts high-resolution features crucial for small object detection.
Bootstrapping for Hard Negative Mining: The cascaded Boosted Forest leverages effective bootstrapping to iteratively mine hard negatives, improving classifier resilience to false positives.
Combination of Features: Features extracted from multiple convolutional layers are concatenated for classification, exploiting diverse resolutions without normalization requirements. This flexibility is vital for enhanced performance.

Implications and Future Directions

The paper illustrates the significance of high-resolution features and hard-negative mining in pedestrian detection:

Practical Implications: For autonomous driving and surveillance applications, the proposed improvements are particularly relevant due to the frequent occurrence of small-sized pedestrians in these scenarios.
Theoretical Implications: This paper highlights limitations in Fast R-CNN when adapted to tasks requiring high localization accuracy for small objects, suggesting a broader need for tailored approaches in specific object detection domains.

Speculative Future Developments:

End-to-End Hard Example Mining: The paper notes the potential overlap with methods like Online Hard Example Mining (OHEM). A promising future direction would be a thorough comparative analysis or an integration of online hard mining techniques for further optimization.
Enhancements in Deep Learning Techniques: More advanced deep learning architectures can be explored to improve feature resolution adaptively, addressing the intrinsic limitations of current pooling layers in handling small instances.

In conclusion, the paper makes substantial contributions by identifying and addressing critical issues in Faster R-CNN for pedestrian detection. The combination of RPN and Boosted Forests presents a robust alternative, showcasing the importance of high-resolution features and effective negative mining strategies. The results and insights from this research have practical value for real-world applications and provide a foundation for future innovations in specialized object detection tasks.

PDF Markdown

Related Papers

Find Related Papers