EpiGRAF: Rethinking training of 3D GANs (2206.10535v2)

Published 21 Jun 2022 in cs.CV, cs.AI, and cs.LG

Abstract: A very recent trend in generative modeling is building 3D-aware generators from 2D image collections. To induce the 3D bias, such models typically rely on volumetric rendering, which is expensive to employ at high resolutions. During the past months, there appeared more than 10 works that address this scaling issue by training a separate 2D decoder to upsample a low-resolution image (or a feature tensor) produced from a pure 3D generator. But this solution comes at a cost: not only does it break multi-view consistency (i.e. shape and texture change when the camera moves), but it also learns the geometry in a low fidelity. In this work, we show that it is possible to obtain a high-resolution 3D generator with SotA image quality by following a completely different route of simply training the model patch-wise. We revisit and improve this optimization scheme in two ways. First, we design a location- and scale-aware discriminator to work on patches of different proportions and spatial positions. Second, we modify the patch sampling strategy based on an annealed beta distribution to stabilize training and accelerate the convergence. The resulted model, named EpiGRAF, is an efficient, high-resolution, pure 3D generator, and we test it on four datasets (two introduced in this work) at $256^2$ and $512^2$ resolutions. It obtains state-of-the-art image quality, high-fidelity geometry and trains ${\approx} 2.5 \times$ faster than the upsampler-based counterparts. Project website: https://universome.github.io/epigraf.

PDF Abstract

EpiGRAF: Rethinking Training of 3D GANs

The paper "EpiGRAF: Rethinking Training of 3D GANs" introduces an innovative approach to address the scaling issues associated with training 3D-aware generative adversarial networks (GANs) from 2D image collections. Traditional methods impose a 3D bias through volumetric rendering, a computationally expensive endeavor at high resolutions. Recent strategies have typically employed an auxiliary 2D decoder to upsample low-resolution outputs from a 3D generator, which, however, compromised multi-view consistency and geometry fidelity. This paper proposes an alternative in EpiGRAF, a method that achieves high-resolution 3D generation by training patch-wise on image sections.

The authors enhance the patch-wise optimization scheme with two significant contributions. First, the development of a location- and scale-aware discriminator enables efficient processing on varied image patches, maintaining a consistent representation across different scales and positions. Second, they propose a modification to the random patch sampling strategy, utilizing an annealed beta distribution to accelerate training convergence and stability. These advancements culminate in EpiGRAF, a purely 3D generator evaluated on four datasets—two newly introduced—at resolutions of 256x256 and 512x512. The results indicate superior state-of-the-art image quality, improved geometric fidelity, and a training process approximately 2.5 times faster than upsampler-based approaches.

Numerical Results and Claims

The EpiGRAF model showcases remarkable performance metrics across various benchmarks. Specifically, it outperforms several leading 3D GANs like StyleNeRF and GIRAFFE-HD on datasets comprising complex geometries and diverse viewing angles (such as Megascans Plants and Food). It achieves competitive Fréchet Inception Distance (FID) scores while maintaining multi-view consistency without resorting to 2D upsamplers, setting it apart as a more geometry-faithful model. The training efficiency, reported as 24 GPU days for training on a 512x512 resolution, exemplifies the model's computational advantage.

Implications and Future Directions

The implications of EpiGRAF's approach are noteworthy for both theoretical and practical applications in the field of AI. From a theoretical perspective, the shift toward more efficient patch-wise training with scalable discriminative models could impact future GAN architectures, promoting more resourceful ways to achieve high-fidelity outputs without additional 2D processing layers. Practically, this work opens possibilities for more realistic 3D content generation, valuable in fields ranging from virtual reality to digital content creation, where spatial consistency across multiple views is crucial.

As AI continues to evolve, EpiGRAF's methodology may influence advancements in 3D-aware generative models. The relationship between 3D and 2D synthesis and the importance of harmonizing them without losing structural detail could point to new hybrid approaches. Future research might focus on integrating neural radiance field (NeRF) techniques more seamlessly for greater flexibility and control over 3D representations while maintaining computational efficiency.

Conclusion

"EpiGRAF: Rethinking Training of 3D GANs" presents a compelling case for the potential of patch-wise optimization to enhance 3D GAN training. By reimagining the generative process without reliance on upsamplers, this work contributes a significant advance in the neural synthesis of structured objects, expanding the horizons of how high-resolution 3D content is conceived and realized in machine learning frameworks. The implications of efficient, consistent, and high-quality 3D generation are vast, marking this development as a foundational step forward in the field of 3D-aware AI models.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Ivan Skorokhodov (38 papers)
Sergey Tulyakov (108 papers)
Yiqun Wang (31 papers)
Peter Wonka (130 papers)

Citations (122)

View on Semantic Scholar

Related Papers

Find Related Papers

EpiGRAF: Rethinking training of 3D GANs (2206.10535v2)