Multi-view Face Detection Using Deep Convolutional Neural Networks (1502.02766v3)

Published 10 Feb 2015 in cs.CV

Abstract: In this paper we consider the problem of multi-view face detection. While there has been significant research on this problem, current state-of-the-art approaches for this task require annotation of facial landmarks, e.g. TSM [25], or annotation of face poses [28, 22]. They also require training dozens of models to fully capture faces in all orientations, e.g. 22 models in HeadHunter method [22]. In this paper we propose Deep Dense Face Detector (DDFD), a method that does not require pose/landmark annotation and is able to detect faces in a wide range of orientations using a single model based on deep convolutional neural networks. The proposed method has minimal complexity; unlike other recent deep learning object detection methods [9], it does not require additional components such as segmentation, bounding-box regression, or SVM classifiers. Furthermore, we analyzed scores of the proposed face detector for faces in different orientations and found that 1) the proposed method is able to detect faces from different angles and can handle occlusion to some extent, 2) there seems to be a correlation between dis- tribution of positive examples in the training set and scores of the proposed face detector. The latter suggests that the proposed methods performance can be further improved by using better sampling strategies and more sophisticated data augmentation techniques. Evaluations on popular face detection benchmark datasets show that our single-model face detector algorithm has similar or better performance compared to the previous methods, which are more complex and require annotations of either different poses or facial landmarks.

Authors (3)

Sachin Sudhakar Farfade (1 paper)
Mohammad Saberian (4 papers)
Li-Jia Li (29 papers)

Citations (569)

View on Semantic Scholar

Summary

Multi-view Face Detection Using Deep Convolutional Neural Networks

The paper presents the development of the Deep Dense Face Detector (DDFD), a novel method for multi-view face detection utilizing deep convolutional neural networks (CNNs). The method addresses challenges associated with detecting faces across diverse angles and orientations, traditionally requiring multiple models and extensive annotations.

Methodology

DDFD leverages the capabilities of CNNs to detect faces from various perspectives using a single model, eliminating the necessity for pose or landmark annotations. Unlike other deep learning models, DDFD does not incorporate complex components such as segmentation, bounding-box regression, or SVM classifiers, contributing to its reduced computational complexity.

Training Approach:

The network is fine-tuned from AlexNet using the AFLW dataset, comprising 24K annotated faces across 21K images. Data augmentation techniques were employed to enhance training diversity, resulting in 200K positive and 20 million negative examples. The method adopts a sliding window approach to generate a heat map for classification, demonstrating efficiency over region-based methods like R-CNN.

Experimental Analysis

Evaluations were conducted on benchmark datasets including PASCAL Face, AFW, and FDDB. The DDFD demonstrates comparable or superior performance against contemporary methods including those reliant on cascades and deformable part models (DPM). Despite not utilizing additional annotations available in other frameworks, DDFD's results are noteworthy.

Additionally, an analysis of detection scores indicates a correlation between training example distribution and detector confidence. This inferential insight underscores the potential for further performance improvements through refined sampling techniques.

Key Findings

Simplicity and Efficiency:

DDFD employs a single CNN model without needing auxiliary modules, maintaining simplicity.

Performance:

Comparable or improved accuracy compared to complex models that require pose annotations.

Score Analysis:

Indicates distribution bias of the training set influences detector confidence, suggesting benefits from enhanced training data strategies.

Implications and Future Directions

The paper illustrates the strength of deep CNNs in simplifying face detection processes and maintaining accuracy across multiple face orientations. The observed relationship between training data distribution and detection confidence suggests that advancements can be made by adopting more sophisticated data augmentation and sampling methods. This can significantly enhance the model's ability to handle occlusions and rotations, broadening its practical applications.

Concluding, DDFD represents a promising direction in efficient and effective face detection, leveraging the expansive potential of deep learning methodologies. Future explorations could extend towards optimizing training processes and exploring real-time application scenarios to further expand its utility.

PDF Markdown

Related Papers

Find Related Papers