Face Alignment in Full Pose Range: A 3D Total Solution (1804.01005v1)

Published 2 Apr 2018 in cs.CV

Abstract: Face alignment, which fits a face model to an image and extracts the semantic meanings of facial pixels, has been an important topic in the computer vision community. However, most algorithms are designed for faces in small to medium poses (yaw angle is smaller than 45 degrees), which lack the ability to align faces in large poses up to 90 degrees. The challenges are three-fold. Firstly, the commonly used landmark face model assumes that all the landmarks are visible and is therefore not suitable for large poses. Secondly, the face appearance varies more drastically across large poses, from the frontal view to the profile view. Thirdly, labelling landmarks in large poses is extremely challenging since the invisible landmarks have to be guessed. In this paper, we propose to tackle these three challenges in an new alignment framework termed 3D Dense Face Alignment (3DDFA), in which a dense 3D Morphable Model (3DMM) is fitted to the image via Cascaded Convolutional Neural Networks. We also utilize 3D information to synthesize face images in profile views to provide abundant samples for training. Experiments on the challenging AFLW database show that the proposed approach achieves significant improvements over the state-of-the-art methods.

Citations (762)

View on Semantic Scholar

Summary

The paper presents a comprehensive 3D face alignment method that uses a 3D morphable model combined with cascaded CNNs to robustly fit faces even at extreme yaw angles.
It integrates novel features, PNCC and PAF, along with an optimized weighted parameter distance cost to effectively handle self-occlusion and large appearance variations.
Extensive data augmentation through face profiling significantly enhances training with large-pose images, leading to reduced normalized mean error on benchmark datasets.

Overview of "Face Alignment in Full Pose Range: A 3D Total Solution"

The academic paper titled "Face Alignment in Full Pose Range: A 3D Total Solution" by Xiangyu Zhu, Xiaoming Liu, Zhen Lei, and Stan Z. Li tackles the challenge of aligning faces in images across a wide range of poses, specifically extending the yaw angle up to 90 degrees, by integrating 3D models. Traditional face alignment algorithms often fail at high yaw angles due to difficulties in modeling self-occluded landmarks, significant appearance variations, and a lack of annotated training data for such poses. This paper proposes a comprehensive solution to these issues by leveraging a 3D morphable model (3DMM) and integrating it with Cascaded Convolutional Neural Networks (CNN).

Key Contributions

The main innovative aspects of this work include:

3D Morphable Model for Face Fitting:
- The paper advances the face alignment paradigm by transitioning from 2D sparse landmark models to a dense 3DMM. This model addresses the self-occlusion challenge by fitting only visible vertices to detected image patterns, with invisible landmarks being extrapolated if required.
Cascaded Convolutional Neural Networks:
- The integration of CNNs with a cascaded regression framework significantly improves performance across large poses. This is achieved by employing two novel input features, Projected Normalized Coordinate Code (PNCC) and Pose Adaptive Feature (PAF), which provide complementary benefits and enable effective cascade learning.
Innovative Cost Function:
- The introduction of Optimized Weighted Parameter Distance Cost (OWPDC) helps prioritize the importance of different 3DMM parameters, addressing issues of parameter inequivalence and pathological curvature seen in conventional cost functions like PDC and VDC.
Extensive Data Augmentation via Face Profiling:
- To overcome the scarcity of large-pose training data, the authors developed a face profiling method that synthesizes profile views by rotating existing face images in 3D. This method generates realistic training samples, significantly augmenting the dataset and enhancing the robustness of the model.

Methodology

The proposed framework, 3D Dense Face Alignment (3DDFA), operates through the following novel elements:

Rotation Formulation:
- Quaternions are used instead of traditional Euler angles to eliminate gimbal lock and ambiguities in 3D rotation representations, thus ensuring robustness even at profile views.
Feature Design:
- Pose Adaptive Feature (PAF): Crops the image based on cylindrical coordinates anchored at the face model, maintaining consistent semantic locations for convolution, which aids pose-invariant feature extraction.
- Projected Normalized Coordinate Code (PNCC): A 3D mean face is projected and rendered into 2D, using normalized coordinates as pixel indices to provide a smooth representation for convolution operations.
Training Strategy:
- To mitigate overfitting risks, an initialization regeneration strategy is applied, ensuring diverse starting points for each cascade iteration.
- A carefully constructed training set augmented by synthetic face profiling data enriches the network with broad pose coverage.

Experimental Results

The efficacy of the method was validated across three datasets:

AFLW: Evaluations demonstrate a significant reduction in Normalized Mean Error (NME) across different yaw angle intervals.
AFLW2000-3D: For comprehensive 3D face alignment tasks, 3DDFA outperforms existing 2D methods, particularly illustrating the robustness and accuracy of 3D alignment.
300W: The proposed method also holds competitive ground in medium-pose face alignment, particularly excelling in challenging subsets.

Implications and Future Work

The proposed 3D-based alignment method addresses the limitations of traditional 2D methods, particularly in large-pose scenarios, and enhances the robustness and accuracy of face alignment tasks. Practically, this could improve facial recognition systems' performance, animation, and other vision applications requiring precise facial feature localization.

Future work could explore extending the 3DDFA framework to handle dynamic expressions and integrating the model more explicitly with real-time applications. There is also potential for further optimizing the computational efficiency of the feature extraction process, facilitating broader deployment in resource-constrained environments.

In conclusion, this paper presents a substantial advancement in the face alignment domain, particularly under extreme pose variations, by combining the strengths of 3D modeling and CNN-based regression frameworks. The innovations in input feature engineering and cost function design are particularly noteworthy, highly contributing to the robustness and accuracy improvements observed in empirical evaluations.

PDF Markdown