Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images (2404.07474v1)

Published 11 Apr 2024 in cs.CV

Abstract: Novel view synthesis aims to generate new view images of a given view image collection. Recent attempts address this problem relying on 3D geometry priors (e.g., shapes, sizes, and positions) learned from multi-view images. However, such methods encounter the following limitations: 1) they require a set of multi-view images as training data for a specific scene (e.g., face, car or chair), which is often unavailable in many real-world scenarios; 2) they fail to extract the geometry priors from single-view images due to the lack of multi-view supervision. In this paper, we propose a Geometry-enhanced NeRF (G-NeRF), which seeks to enhance the geometry priors by a geometry-guided multi-view synthesis approach, followed by a depth-aware training. In the synthesis process, inspired that existing 3D GAN models can unconditionally synthesize high-fidelity multi-view images, we seek to adopt off-the-shelf 3D GAN models, such as EG3D, as a free source to provide geometry priors through synthesizing multi-view data. Simultaneously, to further improve the geometry quality of the synthetic data, we introduce a truncation method to effectively sample latent codes within 3D GAN models. To tackle the absence of multi-view supervision for single-view images, we design the depth-aware training approach, incorporating a depth-aware discriminator to guide geometry priors through depth maps. Experiments demonstrate the effectiveness of our method in terms of both qualitative and quantitative results.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Self-consuming generative models go mad. arXiv preprint arXiv:2307.01850, 2023.
  2. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In IEEE Int. Conf. Comput. Vis., pages 5855–5864, 2021.
  3. Demystifying mmd gans. In Int. Conf. Learn. Represent., 2018.
  4. Large scale gan training for high fidelity natural image synthesis. In Int. Conf. Learn. Represent., 2019.
  5. Pix2NeRF: Unsupervised conditional p-gan for single image to neural radiance fields translation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3981–3990, 2022.
  6. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5799–5809, 2021.
  7. Efficient geometry-aware 3d generative adversarial networks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 16123–16133, 2022.
  8. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  9. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In IEEE Int. Conf. Comput. Vis., pages 14124–14133, 2021.
  10. Stargan v2: Diverse image synthesis for multiple domains. In IEEE Conf. Comput. Vis. Pattern Recog., pages 8188–8197, 2020.
  11. Arcface: Additive angular margin loss for deep face recognition. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4690–4699, 2019a.
  12. Depth-supervised nerf: Fewer views and faster training for free. In IEEE Conf. Comput. Vis. Pattern Recog., pages 12882–12891, 2022a.
  13. Fov-nerf: Foveated neural radiance fields for virtual reality. IEEE Trans. Vis. Comput. Graph., 28(11):3854–3864, 2022b.
  14. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In IEEE Conf. Comput. Vis. Pattern Recog. Workshops, 2019b.
  15. Gram: Generative radiance manifolds for 3d-aware image generation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10673–10683, 2022c.
  16. Equivariant neural rendering. In Int. Conf. Mach. Learn., pages 2761–2770, 2020.
  17. Ucsf chimerax: Meeting modern challenges in visualization and analysis. Protein Science, 27(1):14–25, 2018.
  18. Generative adversarial nets. In Adv. Neural Inform. Process. Syst., 2014a.
  19. Generative adversarial nets. Adv. Neural Inform. Process. Syst., 27, 2014b.
  20. Auto-embedding generative adversarial networks for high resolution image synthesis. IEEE Trans. Multimedia, 21(11):2726–2737, 2019.
  21. Ad-nerf: Audio driven neural radiance fields for talking head synthesis. In IEEE Int. Conf. Comput. Vis., pages 5784–5794, 2021.
  22. Asymmetric joint gans for normalizing face illumination from a single image. IEEE Trans. Multimedia, 22(6):1619–1633, 2019.
  23. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Adv. Neural Inform. Process. Syst., pages 6626–6637, 2017.
  24. Headnerf: A real-time nerf-based parametric head model. In IEEE Conf. Comput. Vis. Pattern Recog., 2022.
  25. Codenerf: Disentangled neural radiance fields for object categories. In IEEE Int. Conf. Comput. Vis., pages 12949–12958, 2021.
  26. Peng Wang Jiatao Gu, Lingjie Liu and Christian Theobalt. Stylenerf: A style-based 3d aware generator for high-resolution image synthesis. In Int. Conf. Learn. Represent., 2022.
  27. Perceptual losses for real-time style transfer and super-resolution. In Eur. Conf. Comput. Vis., pages 694–711, 2016.
  28. A style-based generator architecture for generative adversarial networks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4401–4410, 2019.
  29. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  30. Self-supervised geometry-aware encoder for style-based 3d gan inversion. In IEEE Conf. Comput. Vis. Pattern Recog., pages 20940–20949, 2023.
  31. Maskgan: Towards diverse and interactive facial image manipulation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5549–5558, 2020.
  32. Cross-stage relation extraction and presentation attack material perception for face anti-spoofing. Neural Networks, page 106275, 2024.
  33. Learning efficient gans for image translation via differentiable masks and co-attention distillation. IEEE Trans. Multimedia, 2022.
  34. Vision transformer for nerf-based view synthesis from a single input image. In IEEE Winter Conf. Appl. Comput. Vis., pages 806–815, 2023.
  35. Which training methods for gans do actually converge? In Int. Conf. Mach. Learn., pages 3481–3490, 2018.
  36. Nerf: Representing scenes as neural radiance fields for view synthesis. In Eur. Conf. Comput. Vis., pages 405–421, 2020.
  37. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
  38. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):1–15, 2022.
  39. Hologan: Unsupervised learning of 3d representations from natural images. In IEEE Int. Conf. Comput. Vis., pages 7588–7597, 2019.
  40. Giraffe: Representing scenes as compositional generative neural feature fields. In IEEE Conf. Comput. Vis. Pattern Recog., pages 11453–11464, 2021.
  41. Pytorch: An imperative style, high-performance deep learning library. In Adv. Neural Inform. Process. Syst., 2019.
  42. Shape, pose, and appearance from a single image via bootstrapped radiance field inversion. In IEEE Conf. Comput. Vis. Pattern Recog., 2023.
  43. D-nerf: Neural radiance fields for dynamic scenes. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10318–10327, 2021.
  44. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell., 44(3), 2022.
  45. Sharf: Shape-conditioned radiance fields from a single view. In Int. Conf. Mach. Learn., 2021.
  46. Pivotal tuning for latent-based editing of real images. ACM Trans. Graph., 2021.
  47. Graf: Generative radiance fields for 3d-aware image synthesis. In Adv. Neural Inform. Process. Syst., pages 20154–20166, 2020.
  48. Gina-3d: Learning to generate implicit neural assets in the wild. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4913–4926, 2023.
  49. Scene representation networks: Continuous 3d-structure-aware neural scene representations. In Adv. Neural Inform. Process. Syst., 2019.
  50. Agegan++: Face aging and rejuvenation with dual conditional gans. IEEE Trans. Multimedia, 24:791–804, 2021.
  51. Compressible-composable nerf via rank-residual decomposition. In Adv. Neural Inform. Process. Syst., 2022.
  52. Real-time radiance fields for single-image portrait view synthesis. In ACM Trans. Graph., 2023.
  53. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process., 13(4):600–612, 2004.
  54. Nerf–: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064, 2021.
  55. Aggregated residual transformations for deep neural networks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1492–1500, 2017.
  56. SinNeRF: Training neural radiance fields on complex scenes from a single image. In Eur. Conf. Comput. Vis., pages 736–753, 2022.
  57. S33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT-nerf: Neural reflectance field from shading and shadow under a single viewpoint. In Adv. Neural Inform. Process. Syst., 2022.
  58. Cross-ray neural radiance fields for novel-view synthesis from unconstrained image collections. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15901–15911, 2023.
  59. Hilo: Detailed and robust 3d clothed human reconstruction with high-and low-frequency information of parametric models. arXiv preprint arXiv:2404.04876, 2024.
  60. Nerfinvertor: High fidelity nerf-gan inversion for single-shot real image animation. In IEEE Conf. Comput. Vis. Pattern Recog., 2023.
  61. pixelnerf: Neural radiance fields from one or few images. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4578–4587, 2021.
  62. Make encoder great again in 3d gan inversion through geometry and occlusion-aware encoding. In IEEE Int. Conf. Comput. Vis., pages 2437–2447, 2023.
  63. Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492, 2020.
  64. The unreasonable effectiveness of deep features as a perceptual metric. In IEEE Conf. Comput. Vis. Pattern Recog., pages 586–595, 2018.
  65. Pose-controllable talking face generation by implicitly modularized audio-visual representation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4176–4186, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zixiong Huang (5 papers)
  2. Qi Chen (194 papers)
  3. Libo Sun (16 papers)
  4. Yifan Yang (578 papers)
  5. Naizhou Wang (17 papers)
  6. Mingkui Tan (124 papers)
  7. Qi Wu (323 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com