Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multiface: A Dataset for Neural Face Rendering (2207.11243v2)

Published 22 Jul 2022 in cs.CV and cs.GR

Abstract: Photorealistic avatars of human faces have come a long way in recent years, yet research along this area is limited by a lack of publicly available, high-quality datasets covering both, dense multi-view camera captures, and rich facial expressions of the captured subjects. In this work, we present Multiface, a new multi-view, high-resolution human face dataset collected from 13 identities at Reality Labs Research for neural face rendering. We introduce Mugsy, a large scale multi-camera apparatus to capture high-resolution synchronized videos of a facial performance. The goal of Multiface is to close the gap in accessibility to high quality data in the academic community and to enable research in VR telepresence. Along with the release of the dataset, we conduct ablation studies on the influence of different model architectures toward the model's interpolation capacity of novel viewpoint and expressions. With a conditional VAE model serving as our baseline, we found that adding spatial bias, texture warp field, and residual connections improves performance on novel view synthesis. Our code and data is available at: https://github.com/facebookresearch/multiface

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics, 20(3):413–425, 2014.
  2. Expressive telepresence via modular codec avatars. In European Conference on Computer Vision, pages 330–345. Springer, 2020.
  3. Massively parallel multiview stereopsis by surface normal diffusion. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 873–881, 2015.
  4. Deltille grids for geometric camera calibration. In Proceedings of the IEEE International Conference on Computer Vision, pages 5344–5352, 2017.
  5. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
  6. Adam: A method for stochastic optimization, 2017.
  7. Modular primitives for high-performance differentiable rendering. ACM Transactions on Graphics, 39(6), 2020.
  8. Deep appearance models for face rendering. ACM Trans. Graph., 37(4), jul 2018.
  9. Neural volumes. ACM Transactions on Graphics, 38(4):1–14, Aug 2019.
  10. Mixture of volumetric primitives for efficient neural rendering. ACM Transactions on Graphics (TOG), 40(4):1–13, 2021.
  11. Pixel codec avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 64–73, 2021.
  12. Nerf: Representing scenes as neural radiance fields for view synthesis. CoRR, abs/2003.08934, 2020.
  13. Strand-accurate multi-view hair capture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 155–164, 2019.
  14. Deformable neural radiance fields. CoRR, abs/2011.12948, 2020.
  15. D-nerf: Neural radiance fields for dynamic scenes. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10313–10322, 2021.
  16. Audio-and gaze-driven facial animation of codec avatars. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 41–50, 2021.
  17. Meshtalk: 3d face animation from speech using cross-modality disentanglement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1173–1182, 2021.
  18. The eyes have it: An integrated eye and face model for photorealistic facial animation. ACM Transactions on Graphics (TOG), 39(4):91–1, 2020.
  19. Deforming autoencoders: Unsupervised disentangling of shape and appearance, 2018.
  20. Constraining dense hand surface tracking with elasticity. ACM Trans. Graph., 39(6), nov 2020.
  21. Human hair inverse rendering using multi-view photometric data. 2021.
  22. Learning compositional radiance fields of dynamic human heads. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5704–5713, 2021.
  23. Humbi: A large multiview dataset of human body expressions, 2020.
Citations (66)

Summary

  • The paper demonstrates that the Multiface dataset overcomes current limitations in neural face rendering by offering dense multi-view captures with diverse facial expressions.
  • It employs a conditional Variational Autoencoder enhanced with spatial biases, warp fields, and differentiable rendering to optimize novel view synthesis.
  • Results show reduced reconstruction errors, paving the way for improved VR telepresence and avatar realism in practical applications.

Multiface: A Dataset for Neural Face Rendering

The paper introduces "Multiface," a novel dataset designed to address the limitations faced by researchers in the field of neural face rendering. Focusing on high-resolution multi-view capturing of facial expressions, the dataset plays a critical role in advancing research in VR telepresence. This detailed paper, supported extensively by Meta Reality Labs Research, provides significant insights into how complex neural models can be trained using meticulously curated data.

The authors focus on a pressing need within the photorealistic avatar domain: the shortage of publicly accessible datasets that offer dense multi-view captures paired with rich variations in facial expressions. These datasets are instrumental in developing real-time telepresence technologies that seamlessly reconstruct human expressions with high fidelity to avoid the uncanny valley effect.

Dataset Design and Capture System

The Multiface dataset is derived from 13 identities, using a sophisticated capturing system known as "Mugsy." This system is pivotal in capturing synchronized high-resolution images across multiple camera angles. The introduction of Mugsy illustrates a robust technical design, supporting 160 color cameras and capturing data at a resolution of 4096x2668 pixels. By utilizing dense camera arrays, the dataset provides unparalleled coverage of facial data, considerably outperforming existing datasets in terms of resolution and the variety of expressions captured.

Methodological Contributions

The authors employ a conditional Variational Autoencoder (VAE) as the baseline model to evaluate novel view synthesis and expression interpolation. The paper presents an ablation paper demonstrating how architectural changes can enhance the generative model's performance, specifically noting the contributions of spatial biases, texture warp fields, and residual connections in rendering quality.

Here are some of the critical methodological advancements:

  • Spatial Biases and Warp Fields: These components added to the VAE architecture significantly improve the model's ability to interpolate and generalize new expressions and viewpoints.
  • Differentiable Rendering: Utilizing Nvdiffrast for differentiable rendering, the authors ensure that screen space loss backpropagates effectively to optimize both texture and geometry predictions.
  • Enhanced Training Pipelines: The training pipeline is designed with optimized color correction parameters and personalized tracking pipelines, ensuring high consistency and accuracy across the dataset, leading to effective learning.

Results and Implications

The experimental results consistently show reduced reconstruction errors with the modified architectures, especially under conditions of novel view and expression synthesis. The paper’s results suggest that architectural innovations can tangibly improve the model's performance even when trained on a limited number of camera views.

From a practical perspective, the release of Multiface and the accompanied model architectures democratizes access to high-resolution multi-view facial data. This accessibility is crucial for the future development of VR applications, enabling more researchers to contribute innovations that could improve avatar realism.

Future Directions

This work encourages further exploration into:

  • Neural Rendering Advancements: Further development of neural rendering techniques that can handle dynamic illumination or integrate multi-modal data inputs such as audio within the rendering pipeline.
  • Expanding Data Diversity: Capturing increased subject diversity in future datasets to improve model robustness across varied demographics.
  • Hardware Optimizations: Adapting the rendering models so they can operate efficiently on consumer-grade hardware, thereby increasing accessibility and applicability of VR technologies.

In conclusion, Multiface represents a significant contribution to neural face rendering research, with potential to catalyze advancements in VR telepresence and related fields. As computing power expands and algorithms become more sophisticated, datasets like Multiface will remain crucial to bridging the gap between virtual representations and human likeness.