CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection (1606.05413v1)

Published 17 Jun 2016 in cs.CV

Abstract: Robust face detection in the wild is one of the ultimate components to support various facial related problems, i.e. unconstrained face recognition, facial periocular recognition, facial landmarking and pose estimation, facial expression recognition, 3D facial model construction, etc. Although the face detection problem has been intensely studied for decades with various commercial applications, it still meets problems in some real-world scenarios due to numerous challenges, e.g. heavy facial occlusions, extremely low resolutions, strong illumination, exceptionally pose variations, image or video compression artifacts, etc. In this paper, we present a face detection approach named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN) to robustly solve the problems mentioned above. Similar to the region-based CNNs, our proposed network consists of the region proposal component and the region-of-interest (RoI) detection component. However, far apart of that network, there are two main contributions in our proposed network that play a significant role to achieve the state-of-the-art performance in face detection. Firstly, the multi-scale information is grouped both in region proposal and RoI detection to deal with tiny face regions. Secondly, our proposed network allows explicit body contextual reasoning in the network inspired from the intuition of human vision system. The proposed approach is benchmarked on two recent challenging face detection databases, i.e. the WIDER FACE Dataset which contains high degree of variability, as well as the Face Detection Dataset and Benchmark (FDDB). The experimental results show that our proposed approach trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE Dataset by a large margin, and consistently achieves competitive results on FDDB against the recent state-of-the-art face detection methods.

Citations (247)

View on Semantic Scholar

Summary

The paper introduces a dual-component CMS-RCNN that integrates multi-scale region proposals and contextual cues to improve face detection.
It leverages feature fusion from multiple CNN layers to accurately capture small and occluded faces in unconstrained environments.
Experimental results on WIDER FACE and FDDB highlight significant performance gains, surpassing previous benchmarks.

Overview of CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection

The paper "CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection" presents an innovative approach to advance facial recognition tasks amid adverse conditions. The work builds upon Convolutional Neural Networks (CNNs) to improve the robustness of face detection systems in unconstrained environments characterized by challenges such as occlusions, low resolution, variable lighting, and extreme poses. The proposed method, CMS-RCNN, integrates multi-scale contextual reasoning into the detection pipeline, addressing significant limitations present in existing CNN-based approaches.

Methodological Contributions

CMS-RCNN introduces two pivotal components: the Multi-Scale Region Proposal Network (MS-RPN) and the Contextual Multi-Scale Convolution Neural Network (CMS-CNN). This dual component framework notably enhances face detection from two perspectives:

Multi-Scale Information Aggregation: The model conducts inference based on features extracted from multiple convolutional layers. This multi-layer fusion addresses the issue of detecting tiny face regions that may be underrepresented, or even missed, when relying solely on high-level features typically used in standard Faster R-CNN implementations.
Contextual Body Reasoning: The paper emphasizes the value of body context in refining face detection performance. By simultaneously processing facial features and body contextual cues, the model emulates human-like detection reasoning, improving confidence and accuracy in identifying faceless portraits under difficult conditions.

Experimental Validation

The CMS-RCNN was empirically validated on two challenging benchmarks: the WIDER FACE dataset, which features a wide variance in facial appearance, and the FDDB dataset, recognized for its real-world collection of faces. Key findings reveal that the CMS-RCNN significantly outperforms existing baselines, demonstrating superiority in face detection across varying degrees of difficulty. The Precision-Recall curve analysis on WIDER FACE shows state-of-the-art Average Precision (AP) values in all categories (easy, medium, and hard), with especially notable margins in the medium and hard conditions.

Implications and Future Directions

The paper establishes a valuable framework for deploying face detection systems in environments that approximate real-world challenges. Practically, CMS-RCNN facilitates more reliable face detection across diverse applications, ranging from security and surveillance to personal device authentication, by enhancing robustness against image quality variances.

In theoretical terms, this research underlines the efficacy of integrating contextual data into region-based CNNs, suggesting further exploration into how contextual cues can be leveraged across different object detection domains. Future work is expected to refine the CMS-RCNN model, particularly by exploring joint end-to-end learning to fully integrate the multi-scale and contextual components, potentially further improving detection fidelity and system efficiency. This paper not only fills a critical gap in face detection literature but also sets a foundation for subsequent advancements in CNN-based object detection methodologies.

PDF Markdown