Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering (2010.09125v2)

Published 18 Oct 2020 in cs.CV and cs.LG

Abstract: Differentiable rendering has paved the way to training neural networks to perform "inverse graphics" tasks such as predicting 3D geometry from monocular photographs. To train high performing models, most of the current approaches rely on multi-view imagery which are not readily available in practice. Recent Generative Adversarial Networks (GANs) that synthesize images, in contrast, seem to acquire 3D knowledge implicitly during training: object viewpoints can be manipulated by simply manipulating the latent codes. However, these latent codes often lack further physical interpretation and thus GANs cannot easily be inverted to perform explicit 3D reasoning. In this paper, we aim to extract and disentangle 3D knowledge learned by generative models by utilizing differentiable renderers. Key to our approach is to exploit GANs as a multi-view data generator to train an inverse graphics network using an off-the-shelf differentiable renderer, and the trained inverse graphics network as a teacher to disentangle the GAN's latent code into interpretable 3D properties. The entire architecture is trained iteratively using cycle consistency losses. We show that our approach significantly outperforms state-of-the-art inverse graphics networks trained on existing datasets, both quantitatively and via user studies. We further showcase the disentangled GAN as a controllable 3D "neural renderer", complementing traditional graphics renderers.

Authors (7)

Yuxuan Zhang (119 papers)
Wenzheng Chen (28 papers)
Huan Ling (23 papers)
Jun Gao (267 papers)
Yinan Zhang (31 papers)
Antonio Torralba (178 papers)
Sanja Fidler (184 papers)

Citations (136)

View on Semantic Scholar

Summary

The paper introduces a method that merges style-based GANs with differentiable rendering to convert latent codes into explicit 3D properties.
It employs an iterative training process with cycle consistency losses to generate multi-view synthetic datasets, enhancing geometry and texture fidelity.
Empirical evaluations and user studies confirm that this approach outperforms current inverse graphics networks, paving the way for accessible 3D content creation.

Image GANs Meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering

This paper integrates the capabilities of Image Generative Adversarial Networks (GANs) with differentiable rendering techniques to address challenges in inverse graphics and 3D neural rendering. Traditional methods for inverse graphics often rely on multi-view imagery, which is not always feasible in real-world applications. Recent advancements in GANs, capable of manipulating object viewpoints implicitly, present a unique opportunity; however, these latent representations often remain non-interpretable and complex to reverse-engineer for explicit 3D reasoning.

The authors introduce an approach that leverages GANs as generators of multi-view data to train an inverse graphics network, utilizing an off-the-shelf differentiable renderer. The inverse graphics network serves as a "teacher" to assist in disentangling the GAN's latent code into interpretable 3D properties. By employing an iterative training process that incorporates cycle consistency losses, they demonstrate superior performance compared to state-of-the-art inverse graphics networks, as evidenced both quantitatively and through user studies.

Key Insights and Contributions:

Integration of GANs and Differentiable Rendering: The paper merges the capabilities of StyleGAN, a state-of-the-art image synthesis network, and DIB-R, a differentiable renderer, to enhance the performance of inverse graphics networks without the need for explicit 3D supervision.
Multi-View Dataset Generation: StyleGAN is used to synthesize diverse multi-view datasets. By manually selecting viewpoint codes associated with rough camera parameters, this method generates large-scale training data, circumventing the limitations posed by existing datasets like ShapeNet.
Iterative Training and Disentanglement: A novel architecture iteratively trains both the GAN and an inverse graphics network, disentangling the GAN's latent space into meaningful 3D properties. This approach transforms the latent space of GANs into a more interpretable structure that can be used in 3D rendering.
Quantitative and User-Based Evaluation: The approach demonstrated improved inverse graphics performance, outperforming networks trained on existing datasets by creating higher fidelity geometry and texture predictions, validated through empirical metrics and user studies.

Implications and Future Directions:

The fusion of GANs with differentiable rendering methods represents a significant stride in inverse graphics. Practically, this could facilitate more accessible 3D content creation, allowing for applications in virtual reality, augmented reality, and computer vision with reduced data annotation requirements. Theoretically, the implications for developing neural rendering models continue to grow, specifically in terms of improved interpretability of generative models.

Looking to the future, this research line opens paths to further enhance GANs and neural rendering by integrating more complex lighting models and overcoming challenges related to articulated objects. Additionally, the potential for using the approach across varied datasets, beyond basic synthetic environments, signals significant strides toward practical real-world application.

PDF Markdown

Related Papers

YouTube

Show All Videos