Learning an Animatable Detailed 3D Face Model from In-The-Wild Images (2012.04012v2)

Published 7 Dec 2020 in cs.CV

Abstract: While current monocular 3D face reconstruction methods can recover fine geometric details, they suffer several limitations. Some methods produce faces that cannot be realistically animated because they do not model how wrinkles vary with expression. Other methods are trained on high-quality face scans and do not generalize well to in-the-wild images. We present the first approach that regresses 3D face shape and animatable details that are specific to an individual but change with expression. Our model, DECA (Detailed Expression Capture and Animation), is trained to robustly produce a UV displacement map from a low-dimensional latent representation that consists of person-specific detail parameters and generic expression parameters, while a regressor is trained to predict detail, shape, albedo, expression, pose and illumination parameters from a single image. To enable this, we introduce a novel detail-consistency loss that disentangles person-specific details from expression-dependent wrinkles. This disentanglement allows us to synthesize realistic person-specific wrinkles by controlling expression parameters while keeping person-specific details unchanged. DECA is learned from in-the-wild images with no paired 3D supervision and achieves state-of-the-art shape reconstruction accuracy on two benchmarks. Qualitative results on in-the-wild data demonstrate DECA's robustness and its ability to disentangle identity- and expression-dependent details enabling animation of reconstructed faces. The model and code are publicly available at https://deca.is.tue.mpg.de.

Citations (502)

View on Semantic Scholar

Summary

The paper introduces DECA, a framework that reconstructs animatable 3D face models with detailed expression capture from unconstrained images.
It employs a UV displacement map regression with a low-dimensional latent space to disentangle permanent facial features from expression-dependent wrinkles.
The approach achieves state-of-the-art accuracy on benchmarks and supports practical applications in AR, VR, and animation even under challenging conditions.

Learning an Animatable Detailed 3D Face Model from In-The-Wild Images

The paper "Learning an Animatable Detailed 3D Face Model from In-The-Wild Images" proposes an advanced methodology for reconstructing animatable 3D face models from monocular images captured in unconstrained environments. The authors introduce DECA (Detailed Expression Capture and Animation), a novel approach aimed at estimating 3D face shapes and animatable details while disentangling person-specific features from expression-dependent wrinkles.

Technical Overview

DECA addresses limitations in current 3D face reconstruction techniques which are often unable to realistically animate detailed facial expressions or lack generalization to in-the-wild images due to reliance on controlled, high-quality datasets. The model regresses a UV displacement map using a low-dimensional latent representation, incorporating both identity-specific and generic expression parameters. This allows DECA to synthesize realistic, individual-specific wrinkles while allowing for facial animation.

A significant innovation is the introduction of a detail-consistency loss, which enables clear separation between permanent facial features and those that change with expressions. This disentanglement facilitates the accurate rendering of person-specific details during facial animation.

Experimental Results

DECA's performance stands out with state-of-the-art shape reconstruction accuracy, confirmed across multiple benchmarks. The method achieves robust results under challenging conditions including varying pose and illumination, as evidenced by detailed quantitative evaluations on the NoW and Feng et al. benchmarks. Additionally, DECA offers a qualitative advancement by maintaining visual and geometric coherence in reconstructed and animated faces.

Implications and Future Work

The implications of this research are multifold. Practically, DECA supports a wide range of applications such as augmented reality, virtual reality avatars, and animation, owing to its ability to generate realistic and animatable 3D facial models from single images. Theoretically, the detail consistency loss represents a relevant step in disentangling complex visual data.

Future investigations may focus on integrating higher-resolution albedo models to enhance rendering quality, as well as addressing facial hair and extreme occlusions. Another potential direction is utilizing video data for improved facial tracking and to capture more nuanced dynamics over time, thereby further expanding DECA's applicability.

In conclusion, this paper presents a notable advancement in 3D face modeling, delivering animatable, detailed outputs from diverse and unconstrained data, thereby solidifying DECA's role in both academic research and practical implementations within the field of computational graphics and vision.