Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NOFA: NeRF-based One-shot Facial Avatar Reconstruction (2307.03441v1)

Published 7 Jul 2023 in cs.CV

Abstract: 3D facial avatar reconstruction has been a significant research topic in computer graphics and computer vision, where photo-realistic rendering and flexible controls over poses and expressions are necessary for many related applications. Recently, its performance has been greatly improved with the development of neural radiance fields (NeRF). However, most existing NeRF-based facial avatars focus on subject-specific reconstruction and reenactment, requiring multi-shot images containing different views of the specific subject for training, and the learned model cannot generalize to new identities, limiting its further applications. In this work, we propose a one-shot 3D facial avatar reconstruction framework that only requires a single source image to reconstruct a high-fidelity 3D facial avatar. For the challenges of lacking generalization ability and missing multi-view information, we leverage the generative prior of 3D GAN and develop an efficient encoder-decoder network to reconstruct the canonical neural volume of the source image, and further propose a compensation network to complement facial details. To enable fine-grained control over facial dynamics, we propose a deformation field to warp the canonical volume into driven expressions. Through extensive experimental comparisons, we achieve superior synthesis results compared to several state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
  2. Image2stylegan++: How to edit the embedded images?. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  3. Restyle: A residual-based stylegan encoder via iterative refinement. In Proceedings of the IEEE International Conference on Computer Vision (CVPR).
  4. RigNeRF: Fully Controllable Neural 3D Portraits. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 20364–20373.
  5. Generative neural articulated radiance fields. arXiv preprint arXiv:2206.14314 (2022).
  6. Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques. 187–194.
  7. Yochai Blau and Tomer Michaeli. 2019. Rethinking lossy compression: The rate-distortion-perception tradeoff. In International Conference on Machine Learning. 675–685.
  8. Efficient Geometry-aware 3D Generative Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision (CVPR).
  9. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  10. VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild. In SIGGRAPH Asia 2022.
  11. Voxceleb2: Deep speaker recognition. arXiv preprint arXiv:1806.05622 (2018).
  12. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). 4690–4699.
  13. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0–0.
  14. Megaportraits: One-shot megapixel neural head avatars. arXiv preprint arXiv:2207.07621 (2022).
  15. Learning an animatable detailed 3D face model from in-the-wild images. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–13.
  16. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8649–8658.
  17. Reconstructing personalized semantic facial nerf models from monocular video. ACM Transactions on Graphics (TOG).
  18. Reconstruction of personalized 3D face rigs from monocular video. ACM Transactions on Graphics (TOG) 35, 3 (2016), 1–15.
  19. Neural head avatars from monocular RGB videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18653–18664.
  20. Ad-nerf: Audio driven neural radiance fields for talking head synthesis. In Proceedings of the IEEE International Conference on Computer Vision.
  21. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (ICCV). 2961–2969.
  22. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems (NeurIPS).
  23. Ian T Jolliffe and Jorge Cadima. 2016. Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374, 2065 (2016), 20150202.
  24. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations (ICLR).
  25. Alias-free generative adversarial networks. arxiv:2106.12423 (2021).
  26. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4401–4410.
  27. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  28. Realistic one-shot mesh-based head avatars. In ECCV 2022.
  29. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics (TOG) 36, 6 (2017), 194–1.
  30. Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision (ECCV).
  31. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 5865–5874.
  32. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  33. A 3D face model for pose and illumination invariant face recognition. In 2009 sixth IEEE international conference on advanced video and signal based surveillance. 296–301.
  34. Pixel-aligned volumetric avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11733–11742.
  35. Pirenderer: Controllable portrait image generation via semantic neural rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 13759–13768.
  36. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  37. Pivotal Tuning for Latent-based Editing of Real Images. arXiv preprint arXiv:2106.05744 (2021).
  38. Learning to regress 3D face shape and expression from an image without 3D supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7763–7772.
  39. Graf: Generative radiance fields for 3d-aware image synthesis. In Advances in Neural Information Processing Systems (NeurIPS).
  40. Animating arbitrary objects via deep motion transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2377–2386.
  41. First order motion model for image animation. Advances in Neural Information Processing Systems (NIPS) 32 (2019).
  42. A-nerf: Articulated neural radiance fields for learning human shape, appearance, and pose. Advances in Neural Information Processing Systems (NIPS) (2021), 12278–12291.
  43. Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars. arXiv preprint arXiv:2211.11208 (2022).
  44. Explicitly controllable 3d-aware portrait generation. arXiv preprint arXiv:2209.05434 (2022).
  45. Fml: Face model learning from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10812–10822.
  46. Designing an encoder for stylegan image manipulation. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–14.
  47. Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 12959–12970.
  48. Morf: Morphable radiance fields for multiview neural head modeling. In SIGGRAPH 2022.
  49. One-shot free-view neural talking-head synthesis for video conferencing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). 10039–10049.
  50. Towards real-world blind face restoration with generative facial prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9168–9178.
  51. Latent Image Animator: Learning to Animate Images via Latent Space Navigation. arXiv preprint arXiv:2203.09043 (2022).
  52. Learning compositional radiance fields of dynamic human heads. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5704–5713.
  53. Anifacegan: Animatable 3d-aware face image generation for video avatars. arXiv preprint arXiv:2210.06465 (2022).
  54. Space-time neural irradiance fields for free-viewpoint video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9421–9431.
  55. CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior. arXiv preprint arXiv:2301.02379 (2023).
  56. Styleheat: One-shot high-resolution editable talking face generation via pretrained stylegan. ECCV (2022).
  57. 3D GAN Inversion with Facial Symmetry Prior. arXiv preprint arXiv:2211.16927 (2022).
  58. Fdnerf: Few-shot dynamic neural radiance fields for face reconstruction and expression editing. In SIGGRAPH Asia 2022.
  59. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 586–595.
  60. Flow-Guided One-Shot Talking Face Generation With a High-Resolution Audio-Visual Dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3661–3670.
  61. Im avatar: Implicit morphable head avatars from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13545–13555.
  62. CelebV-HQ: A large-scale video facial attributes dataset. In European Conference on Computer Vision (ECCV). 650–667.
  63. CelebV-HQ: A Large-Scale Video Facial Attributes Dataset. In ECCV.
  64. In-domain GAN Inversion for Real Image Editing. In Proceedings of European Conference on Computer Vision (ECCV).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Wangbo Yu (15 papers)
  2. Yanbo Fan (46 papers)
  3. Yong Zhang (660 papers)
  4. Xuan Wang (205 papers)
  5. Fei Yin (36 papers)
  6. Yunpeng Bai (35 papers)
  7. Yan-Pei Cao (58 papers)
  8. Ying Shan (252 papers)
  9. Yang Wu (175 papers)
  10. Zhongqian Sun (10 papers)
  11. Baoyuan Wu (107 papers)
Citations (24)

Summary

We haven't generated a summary for this paper yet.