A Survey on Deep Generative 3D-aware Image Synthesis (2210.14267v3)
Abstract: Recent years have seen remarkable progress in deep learning powered visual content creation. This includes deep generative 3D-aware image synthesis, which produces high-idelity images in a 3D-consistent manner while simultaneously capturing compact surfaces of objects from pure image collections without the need for any 3D supervision, thus bridging the gap between 2D imagery and 3D reality. The ield of computer vision has been recently captivated by the task of deep generative 3D-aware image synthesis, with hundreds of papers appearing in top-tier journals and conferences over the past few years (mainly the past two years), but there lacks a comprehensive survey of this remarkable and swift progress. Our survey aims to introduce new researchers to this topic, provide a useful reference for related works, and stimulate future research directions through our discussion section. Apart from the presented papers, we aim to constantly update the latest relevant papers along with corresponding implementations at https://weihaox.github.io/3D-aware-Gen.
- StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows. TOG 40, 3 (2021), 1–21.
- Geometric image synthesis. In ACCV. 85–100.
- Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation. In CVPR. 12608–12618.
- Mip-NeRF: A multiscale representation for anti-aliasing neural radiance fields. In ICCV. 5855–5864.
- Generative neural articulated radiance fields. NeurIPS 35 (2022), 19900–19916.
- Demystifying mmd gans. arXiv preprint arXiv:1801.01401 (2018).
- Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In SIGGRAPH. 187–194.
- Optimizing the latent space of generative networks. arXiv preprint arXiv:1707.05776 (2017).
- The stable artist: Steering semantics in diffusion latent space. arXiv preprint arXiv:2212.06013 (2022).
- VariTex: Variational Neural Face Textures. In ICCV. 13870–13879.
- Pix2NeRF: Unsupervised Conditional pi-GAN for Single Image to Neural Radiance Fields Translation. In CVPR. 3981–3990.
- Efficient Geometry-aware 3D Generative Adversarial Networks. In CVPR. 16102–16112.
- pi-GAN: Periodic implicit generative adversarial networks for 3D-aware image synthesis. In CVPR. 5799–5809.
- Generative novel view synthesis with 3d-aware diffusion models. arXiv preprint arXiv:2304.02602 (2023).
- Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015).
- Towards a neural graphics pipeline for controllable image generation. In CGF, Vol. 40. 127–140.
- Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields. In ECCV. 730–748.
- StarGAN v2: Diverse Image Synthesis for Multiple Domains. In CVPR. 8185–8194.
- Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR. 5828–5839.
- Imagenet: A large-scale hierarchical image database. In CVPR. 248–255.
- ArcFace: Additive angular margin loss for deep face recognition. In CVPR. 4690–4699.
- Disentangled and controllable face image generation via 3D imitative-contrastive learning. In CVPR. 5154–5163.
- GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation. In CVPR. 10663–10673.
- CARLA: An Open Urban Driving Simulator. In CoRL. 1–16.
- Yuki Endo. 2022. User-Controllable Latent Transformer for StyleGAN Image Layout Editing. In Computer Graphics Forum, Vol. 41. 395–406.
- Blobgan: Spatially disentangled scene representations. In ECCV. 616–635.
- GANSpace: Discovering Interpretable GAN Controls. In NeurIPS, Vol. 33. 9841–9850.
- Stylegan-human: A data-centric odyssey of human generation. In ECCV. Springer, 1–19.
- 3d shape induction from 2d views of multiple objects. In 3DV. 402–411.
- Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In CVPR. 8649–8658.
- FastNeRF: High-fidelity neural rendering at 200fps. In ICCV. 14346–14355.
- Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144.
- StyleNeRF: A Style-based 3D Aware Generator for High-resolution Image Synthesis. In ICLR.
- Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models. arXiv preprint arXiv:2303.11073 (2023).
- Escaping Plato’s cave: 3D shape from adversarial rendering. In ICCV. 9983–9992.
- GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In NeurIPS, Vol. 30. 6626–6637.
- Denoising diffusion probabilistic models. In NeurIPS, Vol. 33. 6840–6851.
- EVA3D: Compositional 3D Human Generation from 2D Image Collections. In ICLR.
- HeadNeRF: A real-time nerf-based parametric head model. In CVPR. 20342–20352.
- EfficientNeRF: Efficient Neural Radiance Fields. In CVPR. 12902–12911.
- On the ”steerability" of generative adversarial networks. In ICLR.
- CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In CVPR. 2901–2910.
- James T Kajiya and Brian P Von Herzen. 1984. Ray tracing volume densities. SIGGRAPH 18, 3 (1984), 165–174.
- Holodiffusion: Training a 3D diffusion model using 2D images. In CVPR. 18423–18433.
- Progressive growing of GANs for improved quality, stability, and variation. In ICLR.
- Training Generative Adversarial Networks with Limited Data. In NeurIPS, Vol. 33. 12104–12114.
- A style-based generator architecture for generative adversarial networks. In CVPR. 4401–4410.
- Analyzing and improving the image quality of StyleGAN. In CVPR. 8107–8116.
- NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models. In CVPR. 8496–8506.
- 3d gan inversion with pose optimization. In WACV. 2967–2976.
- Config: Controllable neural face image generation. In ECCV. 299–315.
- Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis. In ECCV. 236–253.
- Diffusion models already have a semantic latent space. In ICLR.
- Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion. In CVPR. 20940–20949.
- Towards unsupervised learning of generative models for 3D controllable image synthesis. In CVPR. 5871–5880.
- BARF: Bundle-Adjusting Neural Radiance Fields. In ICCV. 5721–5731.
- 3D GAN Inversion for Controllable Portrait Image Animation. In ECCV Workshop.
- Neural sparse Voxel Fields. In NeurIPS, Vol. 33. 15651–15663.
- 3D-FM GAN: Towards 3D-Controllable Face Manipulation. In ECCV. 107–125.
- Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In CVPR. 1096–1104.
- Deep Learning Face Attributes in the Wild. In ICCV. 3730–3738.
- Neural volumes: Learning dynamic renderable volumes from images. TOG 38, 4, Article 65 (2019), 14 pages.
- William E Lorensen and Harvey E Cline. 1987. Marching cubes: A high resolution 3D surface construction algorithm. SIGGRAPH (1987), 347–353.
- Adding 3D Geometry Control to Diffusion Models. arXiv preprint arXiv:2306.08103 (2023).
- Nerf in the wild: Neural radiance fields for unconstrained photo collections. In CVPR. 7210–7219.
- NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. In CVPR. 7210–7219.
- Occupancy networks: Learning 3d reconstruction in function space. In CVPR. 4460–4470.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV. 405–421.
- Self-distilled StyleGAN: Towards generation from internet photos. In SIGGRAPH. 1–9.
- DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models. arXiv preprint arXiv:2307.02421 (2023).
- Diffrf: Rendering-guided 3d radiance field diffusion. In CVPR. 4328–4338.
- AutoRF: Learning 3D Object Radiance Fields from Single View Observations. In CVPR. 3971–3980.
- Instant neural graphics primitives with a multiresolution hash encoding. TOG 41, 4 (2022), 1–15.
- HoloGAN: Unsupervised learning of 3D representations from natural images. In ICCV. 7587–7596.
- BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images. In NeurIPS, Vol. 33. 6767–6778.
- Michael Niemeyer and Andreas Geiger. 2021a. Campari: Camera-aware decomposed generative neural radiance fields. In 3DV. 951–961.
- Michael Niemeyer and Andreas Geiger. 2021b. GIRAFFE: Representing scenes as compositional generative neural feature fields. In CVPR. 11453–11464.
- Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In CVPR. 3501–3512.
- Atsuhiro Noguchi and Tatsuya Harada. 2020. RGBD-GAN: Unsupervised 3d representation learning from natural image datasets via rgbd image synthesis. In ICLR.
- StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation. CVPR (2022), 13503–13513.
- Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold. In SIGGRAPH.
- A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis. In NeurIPS. 20002–20013.
- DeepSDF: Learning continuous signed distance functions for shape representation. In CVPR. 165–174.
- Convolutional occupancy networks. In ECCV. 523–540.
- Ravi Ramamoorthi and Pat Hanrahan. 2001. An efficient representation for irradiance environment maps. In SIGGRAPH. 497–500.
- LOLNeRF: Learn from One Look. In CVPR. 1558–1567.
- Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In ICCV. 14335–14345.
- Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. In ICCV. 10901–10911.
- Urban radiance fields. In CVPR. 12932–12942.
- Pivotal Tuning for Latent-based Editing of Real Images. TOG 42, 1 (2022), 1–13.
- Scene representation transformer: Geometry-free novel view synthesis through set-latent scene representations. In CVPR. 6229–6238.
- Improved techniques for training GANs. In NeurIPS. 2226–2234.
- Vq3d: Learning a 3d-aware generative model on imagenet. arXiv preprint arXiv:2302.06833 (2023).
- Johannes L Schonberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In CVPR. 4104–4113.
- GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis. In NeurIPS, Vol. 33. 20154–20166.
- VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids. In NeurIPS, Vol. 35. 33999–34011.
- Interpreting the latent space of GANs for semantic face editing. In CVPR. 9240–9249.
- Yujun Shen and Bolei Zhou. 2021. Closed-Form Factorization of Latent Semantics in GANs. In CVPR. 1532–1540.
- Lifting 2d stylegan for 3d-aware face generation. In CVPR. 6258–6266.
- DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing. arXiv preprint arXiv:2306.14435 (2023).
- Learning 3d-aware image synthesis with unknown pose distribution. In CVPR. 13062–13071.
- 3D-Aware Indoor Scene Synthesis with Depth Priors. In ECCV. 406–422.
- Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator. In NeurIPS, Vol. 35. 7921–7932.
- GAN-control: Explicitly controllable GANs. In ICCV. 14083–14093.
- Implicit neural representations with periodic activation functions. In NeurIPS, Vol. 33. 7462–7473.
- Light field networks: Neural scene representations with single-evaluation rendering. NeurIPS 34 (2021), 19313–19325.
- Deepvoxels: Learning persistent 3d feature embeddings. In CVPR. 2437–2446.
- Scene representation networks: Continuous 3d-structure-aware neural scene representations. NeurIPS 32 (2019), 1119–1130.
- 3D generation on ImageNet. In ICLR.
- EpiGRAF: Rethinking training of 3D GANs. In NeurIPS, Vol. 35. 24487–24501.
- Denoising diffusion implicit models. In ICLR.
- IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis. In SIGGRAPH Asia.
- Next3d: Generative neural texture rasterization for 3d-aware head avatars. In CVPR. 20991–21002.
- FENeRF: Face Editing in Neural Radiance Fields. In CVPR. 7672–7682.
- Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields. In NeurIPS, Vol. 35. 16331–16343.
- Rethinking the inception architecture for computer vision. In CVPR. 2818–2826.
- Block-nerf: Scalable large scene neural view synthesis. In CVPR. 8248–8258.
- StyleRig: Rigging StyleGAN for 3D Control over Portrait Images. In CVPR. 6141–6150.
- PIE: Portrait Image Embedding for Semantic Control. TOG 39, 6 (2020), 1–14.
- Disentangled3D: Learning a 3D Generative Model with Disentangled Geometry and Appearance from Monocular Images. In CVPR. 1516–1525.
- Advances in neural rendering. In Computer Graphics Forum, Vol. 41. 703–735.
- MoFA: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In ICCV workshops. 1274–1283.
- Andrey Voynov and Artem Babenko. 2020. Unsupervised Discovery of Interpretable Directions in the GAN Latent Space. In ICML (Proceedings of Machine Learning Research, Vol. 119). 9786–9796.
- Improving gan equilibrium by raising spatial awareness. In CVPR. 11285–11293.
- Rewriting Geometric Rules of a GAN. TOG 41, 4 (2022), 1–16.
- Xiaolong Wang and Abhinav Gupta. 2016. Generative image modeling using style and structure adversarial networks. In ECCV. 318–335.
- Image quality assessment: From error visibility to structural similarity. TIP 13, 4 (2004), 600–612.
- NeRF–: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064 (2021).
- Novel View Synthesis with Diffusion Models. arXiv preprint arXiv:2210.04628 (2022).
- Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. NeurIPS 29 (2016), 82–90.
- TediGAN: Text-Guided Diverse Face Image Generation and Manipulation. In CVPR. 2256–2265.
- Towards Open-World Text-Guided Face Image Generation and Manipulation. arxiv preprint arxiv: 2104.08910 (2021).
- GAN Inversion: A Survey. TPAMI 45, 3 (2022), 3121–3138.
- 3D-aware Image Generation using 2D Diffusion Models. arXiv preprint arXiv:2303.17905 (2023).
- High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization. In CVPR. 321–331.
- Generative occupancy fields for 3d surface-aware image synthesis. In NeurIPS. 20683–20695.
- DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis. In CVPR. 4402–4412.
- 3D-aware Image Synthesis via Learning Structural and Textural Representations. In CVPR. 18430–18439.
- GIRAFFE-HD: A High-Resolution 3D-aware Generative Model. In CVPR.
- A large-scale car dataset for fine-grained categorization and verification. In CVPR. 3973–3981.
- 3DHumanGAN: Towards photo-realistic 3D-aware human image generation. arXiv preprint arXiv:2212.07378 (2022).
- pixelNeRF: Neural radiance fields from one or few images. In CVPR. 4578–4587.
- LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop. arXiv preprint arXiv:1506.03365 (2015).
- LatentCLR: A contrastive learning approach for unsupervised discovery of interpretable directions. In ICCV. 14263–14272.
- 3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models. arXiv preprint arXiv:2301.11445 (2023).
- 3D-Aware Semantic-Guided Generative Model for Human Synthesis. In ECCV. 339–356.
- Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492 (2020).
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR. 586–595.
- Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis. In CVPR. 18450–18459.
- Generative Multiplane Images: Making a 2D GAN 3D-Aware. In ECCV. 18–35.
- CIPS-3D: A 3d-aware generator of gans based on conditionally-independent pixel synthesis. arXiv preprint arXiv:2110.09788 (2021).
- In-domain gan inversion for real image editing. In ECCV. Springer, 592–608.
- Visual object networks: Image generation with disentangled 3D representations. NeurIPS (2018), 118–129.
- Weihao Xia (26 papers)
- Jing-Hao Xue (54 papers)