Face Detection using Deep Learning: An Improved Faster RCNN Approach (1701.08289v1)

Published 28 Jan 2017 in cs.CV

Abstract: In this report, we present a new face detection scheme using deep learning and achieve the state-of-the-art detection performance on the well-known FDDB face detetion benchmark evaluation. In particular, we improve the state-of-the-art faster RCNN framework by combining a number of strategies, including feature concatenation, hard negative mining, multi-scale training, model pretraining, and proper calibration of key parameters. As a consequence, the proposed scheme obtained the state-of-the-art face detection performance, making it the best model in terms of ROC curves among all the published methods on the FDDB benchmark.

Authors (3)

Xudong Sun (71 papers)
Pengcheng Wu (25 papers)
Steven C. H. Hoi (94 papers)

Citations (583)

View on Semantic Scholar

Summary

An Improved Faster RCNN Approach for Face Detection

This paper introduces a novel face detection framework that leverages deep learning, specifically by enhancing the Faster R-CNN architecture. The research achieves state-of-the-art performance on the FDDB benchmark, surpassing previous models in the domain of face detection. This is accomplished through the integration of several advanced strategies.

Methodology

The proposed method builds on the existing Faster R-CNN framework, comprising two main components: a Region Proposal Network (RPN) and a Fast R-CNN for object classification and boundary refinement. Key innovations in the approach include:

Feature Concatenation: This technique enhances feature extraction by pooling from multiple convolutional layers, capturing both low-level and high-level features. This results in a more nuanced representation of Regions of Interest (RoIs), improving detection accuracy.
Hard Negative Mining: By reintroducing challenging negative samples during training, the model reduces false positives and enhances classification performance. Regions with an Intersection over Union (IoU) below 0.5 are considered hard negatives and are instrumental in refining the model.
Multi-Scale Training: This strategy expands the robustness of the model by introducing variability in input image sizes, thus promoting scale invariance in the detection process.

The method is trained using a combination of datasets, initially employing the WIDER FACE dataset for model pretraining and subsequent fine-tuning on the FDDB dataset.

Experimental Results

The empirical evaluation underscores the effectiveness of the proposed enhancements. On the FDDB benchmark, the framework achieves the highest scores in both standard and continuous ROC measurements when compared to existing published methods. The experiments employ a comprehensive set of metrics and data augmentation strategies to validate performance improvements.

Ablation studies further delineate the contributions of individual components. The results suggest that each strategic enhancement—such as hard negative mining and feature concatenation—provides distinct benefits that cumulatively lead to significant gains in detection capabilities.

Implications and Future Directions

The enhancements proposed for face detection with Faster R-CNN underline crucial methodological advancements in object detection tasks. The findings indicate a substantial improvement in accuracy and robustness for real-world face detection scenarios characterized by occlusions, complex poses, and challenging illuminations.

Practically, these advancements can be pivotal in applications requiring high precision and recall, such as security and surveillance systems. Theoretically, this research lays the groundwork for further exploration of feature representation and model training methodologies in deep learning contexts.

Future research directions may focus on improving the efficiency and scalability of the approach, facilitating real-time face detection applications. Additionally, adapting these improvements to other domains within object detection could further extend the impact of these findings.

This paper represents a significant step forward in the landscape of face detection, providing tangible improvements through refined proposals and thoughtful experimentation.

PDF Markdown