Multiface: A Dataset for Neural Face Rendering (2207.11243v2)

Published 22 Jul 2022 in cs.CV and cs.GR

Abstract: Photorealistic avatars of human faces have come a long way in recent years, yet research along this area is limited by a lack of publicly available, high-quality datasets covering both, dense multi-view camera captures, and rich facial expressions of the captured subjects. In this work, we present Multiface, a new multi-view, high-resolution human face dataset collected from 13 identities at Reality Labs Research for neural face rendering. We introduce Mugsy, a large scale multi-camera apparatus to capture high-resolution synchronized videos of a facial performance. The goal of Multiface is to close the gap in accessibility to high quality data in the academic community and to enable research in VR telepresence. Along with the release of the dataset, we conduct ablation studies on the influence of different model architectures toward the model's interpolation capacity of novel viewpoint and expressions. With a conditional VAE model serving as our baseline, we found that adding spatial bias, texture warp field, and residual connections improves performance on novel view synthesis. Our code and data is available at: https://github.com/facebookresearch/multiface

References (23)

Citations (66)

View on Semantic Scholar

Summary

The paper demonstrates that the Multiface dataset overcomes current limitations in neural face rendering by offering dense multi-view captures with diverse facial expressions.
It employs a conditional Variational Autoencoder enhanced with spatial biases, warp fields, and differentiable rendering to optimize novel view synthesis.
Results show reduced reconstruction errors, paving the way for improved VR telepresence and avatar realism in practical applications.

Multiface: A Dataset for Neural Face Rendering

The paper introduces "Multiface," a novel dataset designed to address the limitations faced by researchers in the field of neural face rendering. Focusing on high-resolution multi-view capturing of facial expressions, the dataset plays a critical role in advancing research in VR telepresence. This detailed paper, supported extensively by Meta Reality Labs Research, provides significant insights into how complex neural models can be trained using meticulously curated data.

The authors focus on a pressing need within the photorealistic avatar domain: the shortage of publicly accessible datasets that offer dense multi-view captures paired with rich variations in facial expressions. These datasets are instrumental in developing real-time telepresence technologies that seamlessly reconstruct human expressions with high fidelity to avoid the uncanny valley effect.

Dataset Design and Capture System

The Multiface dataset is derived from 13 identities, using a sophisticated capturing system known as "Mugsy." This system is pivotal in capturing synchronized high-resolution images across multiple camera angles. The introduction of Mugsy illustrates a robust technical design, supporting 160 color cameras and capturing data at a resolution of 4096x2668 pixels. By utilizing dense camera arrays, the dataset provides unparalleled coverage of facial data, considerably outperforming existing datasets in terms of resolution and the variety of expressions captured.

Methodological Contributions

The authors employ a conditional Variational Autoencoder (VAE) as the baseline model to evaluate novel view synthesis and expression interpolation. The paper presents an ablation paper demonstrating how architectural changes can enhance the generative model's performance, specifically noting the contributions of spatial biases, texture warp fields, and residual connections in rendering quality.

Here are some of the critical methodological advancements:

Spatial Biases and Warp Fields: These components added to the VAE architecture significantly improve the model's ability to interpolate and generalize new expressions and viewpoints.
Differentiable Rendering: Utilizing Nvdiffrast for differentiable rendering, the authors ensure that screen space loss backpropagates effectively to optimize both texture and geometry predictions.
Enhanced Training Pipelines: The training pipeline is designed with optimized color correction parameters and personalized tracking pipelines, ensuring high consistency and accuracy across the dataset, leading to effective learning.

Results and Implications

The experimental results consistently show reduced reconstruction errors with the modified architectures, especially under conditions of novel view and expression synthesis. The paper’s results suggest that architectural innovations can tangibly improve the model's performance even when trained on a limited number of camera views.

From a practical perspective, the release of Multiface and the accompanied model architectures democratizes access to high-resolution multi-view facial data. This accessibility is crucial for the future development of VR applications, enabling more researchers to contribute innovations that could improve avatar realism.

Future Directions

This work encourages further exploration into:

Neural Rendering Advancements: Further development of neural rendering techniques that can handle dynamic illumination or integrate multi-modal data inputs such as audio within the rendering pipeline.
Expanding Data Diversity: Capturing increased subject diversity in future datasets to improve model robustness across varied demographics.
Hardware Optimizations: Adapting the rendering models so they can operate efficiently on consumer-grade hardware, thereby increasing accessibility and applicability of VR technologies.

In conclusion, Multiface represents a significant contribution to neural face rendering research, with potential to catalyze advancements in VR telepresence and related fields. As computing power expands and algorithms become more sophisticated, datasets like Multiface will remain crucial to bridging the gap between virtual representations and human likeness.

PDF Markdown

Related Papers

GitHub

GitHub - facebookresearch/multiface: Hosts the Multiface dataset, which is a multi-view dataset of multiple identities performing a sequence of facial expressions. (756 stars)