Regressing Robust and Discriminative 3D Morphable Models with a very Deep Neural Network (1612.04904v1)

Published 15 Dec 2016 in cs.CV

Abstract: The 3D shapes of faces are well known to be discriminative. Yet despite this, they are rarely used for face recognition and always under controlled viewing conditions. We claim that this is a symptom of a serious but often overlooked problem with existing methods for single view 3D face reconstruction: when applied "in the wild", their 3D estimates are either unstable and change for different photos of the same subject or they are over-regularized and generic. In response, we describe a robust method for regressing discriminative 3D morphable face models (3DMM). We use a convolutional neural network (CNN) to regress 3DMM shape and texture parameters directly from an input photo. We overcome the shortage of training data required for this purpose by offering a method for generating huge numbers of labeled examples. The 3D estimates produced by our CNN surpass state of the art accuracy on the MICC data set. Coupled with a 3D-3D face matching pipeline, we show the first competitive face recognition results on the LFW, YTF and IJB-A benchmarks using 3D face shapes as representations, rather than the opaque deep feature vectors used by other modern systems.

Citations (473)

View on Semantic Scholar

Summary

The paper introduces a deep CNN that efficiently regresses 3D morphable model parameters from single images for enhanced face reconstruction.
It leverages a 101-layer ResNet and an asymmetric loss function to improve robustness and discriminative accuracy, validated on MICC, LFW, YTF, and IJB-A benchmarks.
The proposed method processes images in approximately 0.088 seconds, offering a practical, real-time solution for face recognition applications.

Regressing Robust and Discriminative 3D Morphable Models with a Very Deep Neural Network

This paper presents an innovative approach to improving single-view 3D face shape estimation using a deep convolutional neural network (CNN). The authors address the challenge of instability and lack of specificity in 3D Morphable Models (3DMM) for face recognition when utilized in unconstrained environments. Unlike traditional methods that struggle with varying poses and feature occlusion, this work proposes a robust regressor trained to predict 3DMM parameters directly from an input photograph.

Methodology and Innovations

The primary innovation in this research is the application of a very deep CNN to regress the shape and texture parameters of 3DMMs from single images. The scarcity of labeled data, which is typical for training deep networks for 3D reconstruction tasks, is effectively addressed by generating a substantial labeled dataset. This dataset consists of synthetic 3DMM parameters derived from multiple images per subject, providing a reliable surrogate for ground truth data.

The authors leverage a ResNet architecture, specifically a 101-layer network, to handle the high-dimensionality problem inherent in 3DMM parameter regression. By integrating an asymmetric Euclidean loss function, the network is encouraged to produce more realistic and discriminative outputs by penalizing underestimation differently from overestimation.

Empirical Validation

The performance of the proposed method is validated through a series of rigorous experiments:

3D Shape Accuracy: On the MICC dataset, the proposed method outperforms established alternatives in estimating 3D shape accuracy, producing more precise reconstructions compared to existing single-view techniques. This accuracy is quantitatively measured using standard error metrics.
Recognition Performance: The practical application of the regressed 3D shapes is validated on LFW, YTF, and IJB-A benchmarks. The results show superior robustness and discriminative power of the proposed method over traditional 3DMM and other recent approaches such as 3DDFA, demonstrating competitive face recognition performance using solely 3D shape data.
Efficiency: The method exhibits a significant improvement in runtime efficiency. With a processing time of approximately 0.088 seconds per image, it achieves real-time 3DMM regression, making it suitable for deployment in various real-world applications.

Implications and Future Directions

The implications of this research are extensive for both practical applications and theoretical development in the domain of face recognition and 3D reconstruction. Practically, the method offers a fast, reliable, and interpretable means of leveraging 3D face shapes in recognition systems, avoiding reliance on deep, opaque feature vectors. Theoretically, it opens avenues for further research into regressing additional parameters, such as expressions, and integrating these 3D shapes into more comprehensive recognition frameworks. Future developments may explore deeper networks or alternative architectures that could enhance accuracy while managing computational complexity.

In conclusion, this paper provides a substantial contribution to the field of face recognition through thoughtful integration of CNNs and 3D morphable models, setting a cornerstone for further advancements in robust shape-based facial recognition methodologies.

PDF Markdown