Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis (2402.17364v1)

Published 27 Feb 2024 in cs.CV

Abstract: Recent works in implicit representations, such as Neural Radiance Fields (NeRF), have advanced the generation of realistic and animatable head avatars from video sequences. These implicit methods are still confronted by visual artifacts and jitters, since the lack of explicit geometric constraints poses a fundamental challenge in accurately modeling complex facial deformations. In this paper, we introduce Dynamic Tetrahedra (DynTet), a novel hybrid representation that encodes explicit dynamic meshes by neural networks to ensure geometric consistency across various motions and viewpoints. DynTet is parameterized by the coordinate-based networks which learn signed distance, deformation, and material texture, anchoring the training data into a predefined tetrahedra grid. Leveraging Marching Tetrahedra, DynTet efficiently decodes textured meshes with a consistent topology, enabling fast rendering through a differentiable rasterizer and supervision via a pixel loss. To enhance training efficiency, we incorporate classical 3D Morphable Models to facilitate geometry learning and define a canonical space for simplifying texture learning. These advantages are readily achievable owing to the effective geometric representation employed in DynTet. Compared with prior works, DynTet demonstrates significant improvements in fidelity, lip synchronization, and real-time performance according to various metrics. Beyond producing stable and visually appealing synthesis videos, our method also outputs the dynamic meshes which is promising to enable many emerging applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. ShahRukh Athar. Rignerf: Fully controllable neural 3d portraits. In CVPR, 2022.
  2. Openface 2.0: Facial behavior analysis toolkit. In FG, 2018.
  3. A morphable model for the synthesis of 3d faces. In PACMCGIT, 1999.
  4. Brent Burley. Physically-based shading at disney. 2012.
  5. Efficient geometry-aware 3d generative adversarial networks. In CVPR, 2022.
  6. Lip movements generation at a glance. In ECCV, 2018.
  7. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In CVPR, 2019.
  8. Lip reading in the wild. In ACCV, 2017.
  9. A reflectance model for computer graphics. In PACMCGIT, 1981.
  10. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In CVPRW, 2019.
  11. An efficient method of triangulating equi-valued surfaces by using tetrahedral cells. IEICE TRANSACTIONS on Information and Systems, 1991.
  12. Isosurface stuffing improved: acute lattices and feature matching. In SIGGRAPH, 2013.
  13. 3d morphable face models—past, present, and future. ACM TOG, 2019.
  14. Faceformer: Speech-driven 3d facial animation with transformers. In CVPR, 2021.
  15. Plenoxels: Radiance fields without neural networks. In CVPR, 2022.
  16. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In CVPR, 2020.
  17. Learning deformable tetrahedral meshes for 3d reconstruction. In NeurIPS, 2020.
  18. Get3d: A generative model of high quality 3d textured shapes learned from images. In NeurIPS, 2022.
  19. Reconstruction of personalized 3d face rigs from monocular video. ACM TOG, 2016.
  20. 3d guided fine-grained face manipulation. In CVPR, 2019.
  21. Generative adversarial networks. In NeurIPS, 2014.
  22. Neural head avatars from monocular rgb videos. In CVPR, 2022.
  23. Ad-nerf: Audio driven neural radiance fields for talking head synthesis. In ICCV, 2021.
  24. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, 2017.
  25. Physically based shading in theory and practice. In SIGGRAPH, 2017.
  26. Image-to-image translation with conditional adversarial networks. In CVPR, 2017.
  27. Ray tracing volume densities. In PACMCGIT, 1984.
  28. Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM TOG, 2017.
  29. Analyzing and improving the image quality of stylegan. In CVPR, 2020.
  30. Deep video portraits. ACM TOG, 2018.
  31. Efficient region-aware neural radiance fields for high-fidelity talking portrait synthesis. In ICCV, 2023a.
  32. Generalizable one-shot neural head avatar. In NeurIPS, 2023b.
  33. Deep marching cubes: Learning explicit surface representations. In CVPR, 2018.
  34. Devrf: Fast deformable voxel radiance fields for dynamic scenes. In NeurIPS, 2022a.
  35. Neural sparse voxel fields. In NeurIPS, 2020.
  36. Learning to infer implicit surfaces without 3d supervision. In NeurIPS, 2019.
  37. Semantic-aware implicit neural audio-driven video portrait generation. In ECCV, 2022b.
  38. Neural volumes: Learning dynamic renderable volumes from images. ACM TOG, 2019.
  39. Marching cubes: A high resolution 3d surface construction algorithm. In PACMCGIT, 1987.
  40. Live speech portraits: Real-time photorealistic talking-head animation. ACM TOG, 2021.
  41. Otavatar: One-shot talking face avatar with controllable tri-plane rendering. In CVPR, 2023.
  42. Occupancy networks: Learning 3d reconstruction in function space. In CVPR, 2019.
  43. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  44. Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG, 2022.
  45. Extracting triangular 3d models, materials, and lighting from images. In CVPR, 2021.
  46. Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. In CVPR, 2015.
  47. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In CVPR, 2019.
  48. Deepsdf: Learning continuous signed distance functions for shape representation. In CVPR, 2019.
  49. Nerfies: Deformable neural radiance fields. In ICCV, 2021.
  50. A lip sync expert is all you need for speech to lip generation in the wild. In ACM MM, 2020.
  51. H3d-net: Few-shot high-fidelity 3d head reconstruction. In ICCV, 2021.
  52. Pirenderer: Controllable portrait image generation via semantic neural rendering. In ICCV, 2021.
  53. Learning dynamic facial radiance fields for few-shot talking head synthesis. In ECCV, 2022.
  54. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. In NeurIPS, 2021.
  55. Scene representation networks: Continuous 3d-structure-aware neural scene representations. In NeurIPS, 2019.
  56. Implicit neural representations with periodic activation functions. In NerurIPS, 2020.
  57. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In CVPR, 2022.
  58. Synthesizing obama: learning lip sync from audio. ACM TOG, 2017.
  59. Fourier features let networks learn high frequency functions in low dimensional domains. In NerurIPS, 2020.
  60. Real-time neural radiance talking portrait synthesis via audio-spatial decomposition. arXiv preprint arXiv:2211.12368, 2022.
  61. Face2face: Real-time face capture and reenactment of rgb videos. In CVPR, 2016.
  62. Neural voice puppetry: Audio-driven facial reenactment. In ECCV, 2020.
  63. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In NeurIPS, 2021.
  64. Dfa-nerf: Personalized talking head generation via disentangled face attributes neural rendering. arXiv preprint arXiv:2201.00791, 2022.
  65. Geneface: Generalized and high-fidelity audio-driven 3d talking face synthesis. In ICLR, 2022.
  66. i3dmm: Deep implicit 3d morphable model of human heads. In CVPR, 2021.
  67. Audio-driven talking face video generation with learning-based personalized head pose. arXiv preprint arXiv:2002.10137, 2020.
  68. Styleheat: One-shot high-resolution editable talking face generation via pre-trained stylegan. In ECCV, 2022.
  69. Multimodal inputs driven talking face generation with spatial–temporal dependency. IEEE TCSVT, 2020.
  70. Facial: Synthesizing dynamic talking face with implicit attribute learning. In ICCV, 2021.
  71. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  72. Sadtalker: Learning realistic 3d motion coefficients for stylized audio-driven single image talking face animation. In CVPR, 2023a.
  73. Transforming radiance field with lipschitz network for photorealistic 3d scene stylization. In CVPR, 2023b.
  74. I m avatar: Implicit morphable head avatars from videos. In CVPR, 2022.
  75. Pointavatar: Deformable point-based head avatars from videos. In CVPR, 2023.
  76. Talking face generation by adversarially disentangled audio-visual representation. In AAAI, 2019.
  77. Pose-controllable talking face generation by implicitly modularized audio-visual representation. In CVPR, 2021.
  78. Makelttalk: speaker-aware talking-head animation. ACM TOG, 2020.
Citations (3)

Summary

We haven't generated a summary for this paper yet.