- The paper presents SynthFace, a novel dataset generation technique that uses conditioned Stable Diffusion and FLAME 3DMM to create 250K photorealistic facial images with 3D shape parameters.
- It introduces ControlFace, a neural network trained without 3D supervision, achieving competitive reconstruction performance by leveraging a masked mesh loss and ArcFace for feature extraction.
- The work addresses biases in existing datasets through balanced race and gender representation, while also highlighting future opportunities for incorporating expressive diversity and improved conditioning strategies.
"Fake It Without Making It: Conditioned Face Generation for Accurate 3D Face Reconstruction" (2307.13639)
Introduction
The paper addresses the limitations in 3D face reconstruction, particularly due to the scarcity of 3D training data, by introducing a novel dataset generation approach termed SynthFace. SynthFace leverages conditioned Stable Diffusion along with depth maps sampled from the FLAME 3D Morphable Model (3DMM) of the human face to synthesize 250,000 photorealistic facial images and their corresponding shape parameters and depth maps, balanced by race and gender.
Figure 1: SynthFace dataset, featuring photorealistic faces and corresponding 3DMM shape parameters and depth maps.
SynthFace Dataset and Generation Process
Dataset Creation
SynthFace is constructed using a pipeline that combines 2D and 3D generative models. The process utilizes FLAME 3DMM to generate shape-consistent depth maps which condition a Stable Diffusion model, through ControlNet, to produce realistic 2D facial images.
Figure 2: SynthFace Generator utilizes the FLAME decoder and depth maps to produce photorealistic faces.
The dataset comprises diverse visual identities for the same 3D shape by using various perspective projections (Figure 3), aiming to disentangle identity and perspective effects from the underlying 3D shape. The dataset is notably balanced among different race and gender groups, addressing biases revealed in previous computer vision datasets [buolamwini2018gender].
Figure 3: The SynthFace Dataset features different perspectives and identities for the same 3D shape.
ControlFace Neural Network
Architecture and Training
ControlFace, trained on SynthFace, aims to reconstruct 3D facial shapes without the need for 3D supervision. It employs ArcFace for feature extraction, processed by a mapping network to predict 3DMM parameters, facilitating accurate 3D face reconstruction.
Figure 4: ControlFace training process on the SynthFace dataset.
The network's training strategy focuses on minimizing mesh reconstruction errors directly, using a masked mesh loss function that emphasizes facial regions over peripheral areas.
Experimental Evaluation and Results
ControlFace is evaluated against the NoW benchmark, achieving competitive reconstruction performance without using ground truth 3D data. The network's results are comparable to the state-of-the-art methods such as MICA and AlbedoGAN.
Limitations and Future Work
The current iteration of SynthFace models variations in face shape but lacks expressive diversity. Future work should consider incorporating varying expressions or developing separate models to predict expressions independently. Exploring improved conditioning methods for Stable Diffusion, including multi-modal inputs, could heighten shape consistency in generated images.
In terms of ethical implications, SynthFace is designed to mitigate racial and gender biases but doesn't encompass all possible identity groups. Further intersectional analysis is encouraged to audit model performance comprehensively across different demographics.
Conclusion
The paper presents SynthFace, a novel approach for large-scale dataset generation for 3D face reconstruction, overcoming current limitations in 3D supervisory data through innovative use of generative models. SynthFace is a crucial step towards removing biases and facilitating accurate 3D face reconstruction, providing a framework easily adaptable to advances in generative models. The dataset will be made publicly available to foster further research and development in the field.