A Deep Pyramid Deformable Part Model for Face Detection (1508.04389v1)

Published 18 Aug 2015 in cs.CV

Abstract: We present a face detection algorithm based on Deformable Part Models and deep pyramidal features. The proposed method called DP2MFD is able to detect faces of various sizes and poses in unconstrained conditions. It reduces the gap in training and testing of DPM on deep features by adding a normalization layer to the deep convolutional neural network (CNN). Extensive experiments on four publicly available unconstrained face detection datasets show that our method is able to capture the meaningful structure of faces and performs significantly better than many competitive face detection algorithms.

Citations (163)

View on Semantic Scholar

Summary

The paper presents the Deep Pyramid Deformable Part Model (DP2MFD), a novel face detection approach combining Deformable Part Models with normalized deep convolutional features to handle unconstrained environments.
A key methodological contribution is the introduction of a normalization layer within the deep feature pyramid extraction, which effectively mitigates face size biases and improves detection across scales.
Experimental results on datasets like AFW, FDDB, MALF, and IJB-A demonstrate that DP2MFD achieves superior performance compared to many existing face detection systems, particularly in diverse and challenging conditions.

Overview of "A Deep Pyramid Deformable Part Model for Face Detection"

The paper presents a novel approach to face detection that leverages Deformable Part Models (DPM) enhanced with deep pyramidal features, termed as DP2MFD. This method is designed to detect faces across a wide array of sizes and orientations in unconstrained settings, which typically pose significant challenges to traditional face detection algorithms. The integration of deep convolutional neural network (CNN) features with a normalization layer addresses the shortcomings observed when applying DPM on deep features directly.

Methodology

The proposed DP2MFD system consists of two main components: the creation of a normalized deep feature pyramid and the application of DPMs. The feature pyramid is generated by propagating an image through a deep CNN, producing a hierarchy of deep features at various scales. A crucial innovation in this paper is the introduction of a normalization layer that mitigates biases linked to face size, thereby enhancing the robustness and reliability of the system in diverse conditions.

DP2MFD contrasts with previous methods that rely heavily on Haar-like or HOG features for face representation. By extracting features from the $max_{5}$ layer of a deep CNN, and applying $z$ -score normalization, the authors address the inherent limitations of fixed-scale feature maps. This normalization ensures consistent feature activation across pyramid levels, which significantly improves the detection accuracy for faces at varying scales and poses.

Experimental Evaluation

The method was rigorously tested on multiple face detection datasets, including AFW, FDDB, MALF, and IJB-A, demonstrating superior performance against numerous academic and commercial systems. Notably, the DP2MFD algorithm showed robust detection capabilities on AFW and FDDB datasets, outperforming many existing models in terms of precision and recall. On the MALF dataset, while overall performing strongly, it showed a slight reduction in performance on subsets with 'hard' cases. On the IJB-A dataset, DP2MFD displayed significant improvements over previous algorithms, highlighting its efficacy in uncontrolled environments with a diverse set of poses and expressions.

Implications and Future Directions

The implications of this research span both practical applications and future theoretical explorations. Practically, this method can be deployed in real-world scenarios where face detection is essential under varying conditions, such as surveillance, media content analysis, and personal devices. Theoretically, the integration of normalization layers in deep networks for object detection signifies an area ripe for exploration, potentially leading to further enhancements in feature extraction processes. Future developments may explore optimization techniques, such as GPU utilization, to reduce computational costs and improve detection speed, making this approach more commercially viable.

Beyond its current scope, the research opens avenues for refining face detection algorithms by integrating alignment and localization techniques that could augment the foundational performance seen here. Additionally, enhancing model adaptability to capture facial attributes dynamically presents an interesting challenge that could further propel the capabilities of DP2MFD systems.

This paper underscores the continuous evolution in face detection technologies and the pivotal role of leveraging deep learning advancements to overcome traditional limitations, fostering a more resilient and accurate detection paradigm.