StyleRig: Rigging StyleGAN for 3D Control over Portrait Images (2004.00121v2)

Published 31 Mar 2020 in cs.CV and cs.GR

Abstract: StyleGAN generates photorealistic portrait images of faces with eyes, teeth, hair and context (neck, shoulders, background), but lacks a rig-like control over semantic face parameters that are interpretable in 3D, such as face pose, expressions, and scene illumination. Three-dimensional morphable face models (3DMMs) on the other hand offer control over the semantic parameters, but lack photorealism when rendered and only model the face interior, not other parts of a portrait image (hair, mouth interior, background). We present the first method to provide a face rig-like control over a pretrained and fixed StyleGAN via a 3DMM. A new rigging network, RigNet is trained between the 3DMM's semantic parameters and StyleGAN's input. The network is trained in a self-supervised manner, without the need for manual annotations. At test time, our method generates portrait images with the photorealism of StyleGAN and provides explicit control over the 3D semantic parameters of the face.

Citations (391)

View on Semantic Scholar

Summary

The paper introduces a novel rigging method using RigNet to provide semantic 3D control over StyleGAN-generated facial images.
The approach fuses 3DMM parameters with self-supervised learning and a differentiable renderer to achieve precise control and photorealism.
Experimental results demonstrate interactive face editing and style mixing, surpassing standard StyleGAN limitations and indicating promising future directions.

An Insightful Overview of StyleRig: Rigging StyleGAN for 3D Control over Portrait Images

The paper "StyleRig: Rigging StyleGAN for 3D Control over Portrait Images" addresses a pivotal challenge at the intersection of computer vision and computer graphics: providing semantic and interpretable 3D control over photorealistic images of human faces generated by StyleGAN, a state-of-the-art generative adversarial network. This is accomplished by integrating the high-quality rendering capabilities of StyleGAN with the semantic control capabilities typically found in 3D Morphable Models (3DMMs).

Key Contributions

The authors propose a novel method, StyleRig, to introduce explicit, rig-like control over StyleGAN-generated facial imagery, leveraging a pretrained and fixed StyleGAN network through a trained network, RigNet. This comes in contrast to the current limitations of StyleGAN, which lacks structured, semantic control over key facial attributes such as pose, expression, and illumination.

Methodological Approach

StyleRig is built on the fusion of two main components: 3DMMs, which offer control over semantic parameters but lack photorealism, and StyleGAN, which provides photorealistic images without semantic control. The core innovation lies in training a rigging network, RigNet, which acts as a bridge between the semantic parameter space of 3DMMs and the input space of StyleGAN.

RigNet is trained using a self-supervised learning paradigm, negating the need for manually annotated datasets. A key aspect of this training regime is a self-supervised two-way cycle consistency loss that ensures the alignment of generated images with semantic controls from 3DMMs. This is coupled with a differentiable renderer that computes photometric rerendering errors, thereby refining imagery fidelity.

Strong Numerical and Experimental Results

The authors articulate compelling experimental outcomes that underline the effectiveness of StyleRig. The approach successfully enables interactive face editing through precise control over semantic parameters. StyleRig facilitates operations such as style mixing, where a blend-mode of various facial attributes like head pose, expressions, and scene illumination can be transferred between images. The framework also allows for the generation of images based on fixed semantic parameters, effectively turning an unconditional generative model into a conditional variant.

Additionally, the research demonstrated effective style mixing that goes beyond the capabilities of standard StyleGAN, illustrating its superiority in manipulating distinct semantic attributes without entanglement issues.

Implications and Future Directions

The implications of this research are manifold, both practical and theoretical. Practically, StyleRig opens new avenues in photorealistic image synthesis with artist-level control, benefiting fields such as extended reality, animation, and virtual communications. From a theoretical perspective, StyleRig provides insights into the interplay between GANs and 3DMMs, potentially guiding future architectures that integrate semantic control with generative models.

The authors acknowledge limitations related to the expressivity lost in converting 3DMM parameters into latent modifications within StyleGAN due to training biases. These limitations highlight future areas of exploration, such as improving the biases in dataset training and refining the differentiable face reconstruction methods to capture more nuanced facial details.

In conclusion, the StyleRig framework offers a substantial advancement in the field of controllable image synthesis, blending the high-fidelity outputs of state-of-the-art generative networks with the user-friendly control parameters found in morphable models. This work paves the way for future innovations that can deepen our understanding and capability in employing generative networks for structured image editing tasks.

PDF Markdown