AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild" (2003.13845v1)

Published 30 Mar 2020 in cs.CV and cs.GR

Abstract: Over the last years, with the advent of Generative Adversarial Networks (GANs), many face analysis tasks have accomplished astounding performance, with applications including, but not limited to, face generation and 3D face reconstruction from a single "in-the-wild" image. Nevertheless, to the best of our knowledge, there is no method which can produce high-resolution photorealistic 3D faces from "in-the-wild" images and this can be attributed to the: (a) scarcity of available data for training, and (b) lack of robust methodologies that can successfully be applied on very high-resolution data. In this paper, we introduce AvatarMe, the first method that is able to reconstruct photorealistic 3D faces from a single "in-the-wild" image with an increasing level of detail. To achieve this, we capture a large dataset of facial shape and reflectance and build on a state-of-the-art 3D texture and shape reconstruction method and successively refine its results, while generating the per-pixel diffuse and specular components that are required for realistic rendering. As we demonstrate in a series of qualitative and quantitative experiments, AvatarMe outperforms the existing arts by a significant margin and reconstructs authentic, 4K by 6K-resolution 3D faces from a single low-resolution image that, for the first time, bridges the uncanny valley.

Authors (7)

Alexandros Lattas (17 papers)
Stylianos Moschoglou (18 papers)
Baris Gecer (15 papers)
Stylianos Ploumpis (17 papers)
Vasileios Triantafyllou (2 papers)
Abhijeet Ghosh (4 papers)
Stefanos Zafeiriou (137 papers)

Citations (133)

View on Semantic Scholar

Summary

Insightful Overview of "AvatarMe: Realistically Renderable 3D Facial Reconstruction 'in-the-wild'"

The paper "AvatarMe: Realistically Renderable 3D Facial Reconstruction 'in-the-wild'" introduces a novel method, AvatarMe, which achieves high-quality 3D face reconstructions from single images taken under uncontrolled conditions, referred to as "in-the-wild." This research addresses significant challenges in face reconstruction, focusing on producing high-resolution photorealistic 3D facial models that bridge the uncanny valley—a notable achievement in the intersection of computer vision, graphics, and machine learning.

Key Components and Methodology

The AvatarMe method is characterized by an intricate pipeline designed for reconstructing realistic, render-ready faces from diverse images. The approach is structured in several distinct stages:

Data Acquisition and Preparation: The authors collected a comprehensive dataset using state-of-the-art facial capture technologies to gather high-resolution reflectance maps crucial for network training. This dataset, comprising over 200 subjects, constitutes the largest of its kind and supports various facial expressions and diverse characteristics.
Initial 3D Reconstruction: The methodology relies on existing 3D Morphable Models (3MM) and generative adversarial networks (GANs), specifically the GANFIT model, to create a base 3D geometry and texture from the input image. This initial step involves enhancing the facial texture resolution significantly using a residual channel attention network.
De-lighting and Reflectance Estimation: Post initial reconstruction, the authors employ image translation networks to remove baked illumination from the textures, yielding the diffuse albedo. Subsequently, separate networks infer the specular albedo and both diffuse and specular normals, critical for rendering processes. A critical innovation involves extending the input to the translation networks by incorporating shape normals and depth information to enhance detail preservation.
Rendering and Head Completion: With the reconstructed facial geometry and reflectance estimates, AvatarMe can be used to produce highly realistic render-ready faces adaptable to various virtual environments. The method extends the facial geometry to a universal head model for complete avatar rendering.

Numerical Results and Comparative Analysis

The paper provides substantial numeric evidence of AvatarMe's efficacy. In comparisons with state-of-the-art methods, AvatarMe notably excels in terms of Peak Signal-to-Noise Ratio (PSNR) for both albedo and normal maps. More than just numerical superiority, qualitative assessments demonstrate AvatarMe's robustness against variations in input lighting conditions and input types, ranging from detailed color images to sketches, thus showcasing its flexibility and application breadth.

Implications and Further Directions

The development of AvatarMe marks a significant contribution to the field of 3D facial reconstruction. The paper's findings hold substantial promise for applications within entertainment, virtual reality, and security sectors, where realistic and dynamic human modeling is paramount.

From a theoretical standpoint, the research delineates the potential for leveraging high-resolution datasets alongside advanced GANs and domain-specific CNN architectures to tackle longstanding challenges in photorealistic rendering. Future advancements may build upon AvatarMe by integrating even more sophisticated facial capture techniques, further increasing robustness across diverse demographic groupings, and extending the approach to full-body reconstructions.

In conclusion, AvatarMe is a salient step forward in marrying machine learning with computer graphics, effectively narrowing the gap between artificial reconstructions and their real-world counterparts. The methodology's extension beyond facial reconstruction offers exciting prospects for the broader application of AI-driven realistic renderings.

PDF Markdown

Related Papers

YouTube

Show All Videos