Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HeadCraft: Modeling High-Detail Shape Variations for Animated 3DMMs (2312.14140v1)

Published 21 Dec 2023 in cs.CV

Abstract: Current advances in human head modeling allow to generate plausible-looking 3D head models via neural representations. Nevertheless, constructing complete high-fidelity head models with explicitly controlled animation remains an issue. Furthermore, completing the head geometry based on a partial observation, e.g. coming from a depth sensor, while preserving details is often problematic for the existing methods. We introduce a generative model for detailed 3D head meshes on top of an articulated 3DMM which allows explicit animation and high-detail preservation at the same time. Our method is trained in two stages. First, we register a parametric head model with vertex displacements to each mesh of the recently introduced NPHM dataset of accurate 3D head scans. The estimated displacements are baked into a hand-crafted UV layout. Second, we train a StyleGAN model in order to generalize over the UV maps of displacements. The decomposition of the parametric model and high-quality vertex displacements allows us to animate the model and modify it semantically. We demonstrate the results of unconditional generation and fitting to the full or partial observation. The project page is available at https://seva100.github.io/headcraft.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. GitHub: Awesome Pretrained Stylegan by justinpinkney. A collection of pre-trained StyleGAN models trained on different datasets at different resolution. https://github.com/justinpinkney/awesome-pretrained-stylegan.
  2. Learn OpenGL – Normal Mapping. https://learnopengl.com/Advanced-Lighting/Normal-Mapping.
  3. HeadCraft: Modeling High-Detail Shape Variations for Animated 3DMMs. Supplementary Material.
  4. Detailed human avatars from monocular video. In 2018 International Conference on 3D Vision (3DV), pages 98–109. IEEE, 2018.
  5. Learning to reconstruct people in clothing from a single rgb camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1175–1186, 2019.
  6. imGHUM: Implicit generative models of 3D human shape and articulated pose. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5461–5470, 2021.
  7. Panohead: Geometry-aware 3d full-head synthesis in 360. arXiv preprint arXiv:2303.13071, 2023.
  8. Generative neural articulated radiance fields. Advances in Neural Information Processing Systems, 35:19900–19916, 2022.
  9. Demystifying mmd gans. arXiv preprint arXiv:1801.01401, 2018.
  10. A Morphable Model for the Synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, page 187–194. ACM Press/Addison-Wesley Publishing Co., 1999.
  11. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
  12. Dynamic surface function networks for clothed human bodies. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10754–10764, 2021.
  13. Authentic volumetric avatars from a phone scan. ACM Transactions on Graphics (TOG), 41(4):1–19, 2022.
  14. pi-GAN: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5799–5809, 2021.
  15. Efficient geometry-aware 3D generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022.
  16. Head3d: Complete 3d head generation via tri-plane feature distillation. arXiv preprint arXiv:2303.15892, 2023.
  17. MeshLab: an Open-Source Mesh Processing Tool. In Eurographics Italian Chapter Conference. The Eurographics Association, 2008.
  18. Blender Online Community. Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam, 2018.
  19. A butterfly subdivision scheme for surface interpolation with tension control. ACM transactions on Graphics (TOG), 9(2):160–169, 1990.
  20. 3d morphable face models—past, present, and future. ACM Transactions on Graphics (ToG), 39(5):1–38, 2020.
  21. HyperDiffusion: Generating implicit neural fields with weight-space diffusion. arXiv preprint arXiv:2303.17015, 2023.
  22. Learning an animatable detailed 3d face model from in-the-wild images. ACM Transactions on Graphics (ToG), 40(4):1–13, 2021.
  23. Learning disentangled avatars with hybrid 3d representations. arXiv, 2023.
  24. Steven Fortune. Voronoi diagrams and delaunay triangulations. In Handbook of discrete and computational geometry, pages 705–721. Chapman and Hall/CRC, 2017.
  25. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8649–8658, 2021.
  26. Learning neural parametric head models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21003–21012, 2023.
  27. Neural head avatars from monocular rgb videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18653–18664, 2022.
  28. GANs trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  29. Headnerf: A real-time nerf-based parametric head model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20374–20384, 2022.
  30. Training generative adversarial networks with limited data. Advances in neural information processing systems, 33:12104–12114, 2020a.
  31. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020b.
  32. Realistic one-shot mesh-based head avatars. In European Conference on Computer Vision, pages 345–362. Springer, 2022.
  33. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019.
  34. NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads. arXiv preprint arXiv:2305.03027, 2023.
  35. Learning formation of physically-based face attributes. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3410–3419, 2020.
  36. Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph., 36(6):194–1, 2017.
  37. SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, 2015.
  38. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. CoRR, abs/2003.08934, 2020.
  39. Stylesdf: High-resolution 3d-consistent image and geometry generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13503–13513, 2022.
  40. DeepSDF: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
  41. Expressive body capture: 3D hands, face, and body from a single image. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 10975–10985, 2019.
  42. Towards a complete 3d morphable model of the human head. IEEE transactions on pattern analysis and machine intelligence, 43(11):4142–4160, 2020.
  43. H3d-net: Few-shot high-fidelity 3d head reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5620–5629, 2021.
  44. Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501, 2020.
  45. Lolnerf: Learn from one look. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1558–1567, 2022.
  46. Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6), 2017.
  47. Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
  48. Tilo Strutz. The distance transform and its computation. arXiv preprint arXiv:2106.03503, 2021.
  49. Explicitly controllable 3D-aware portrait generation. arXiv preprint arXiv:2209.05434, 2022.
  50. Stylerig: Rigging stylegan for 3d control over portrait images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6142–6151, 2020.
  51. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2387–2395, 2016.
  52. Pixel recurrent neural networks. In International conference on machine learning, pages 1747–1756. PMLR, 2016.
  53. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  54. Improved laplacian smoothing of noisy surface meshes. In Computer graphics forum, pages 131–138. Wiley Online Library, 1999.
  55. Faceverse: a fine-grained and detail-controllable 3D face morphable model from a hybrid dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20333–20342, 2022.
  56. Anifacegan: Animatable 3d-aware face image generation for video avatars. Advances in Neural Information Processing Systems, 35:36188–36201, 2022.
  57. Ghum & ghuml: Generative 3d human shape and articulated pose models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  58. Pointflow: 3D point cloud generation with continuous normalizing flows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4541–4550, 2019.
  59. Facescape: a large-scale high quality 3d face dataset and detailed riggable 3d face prediction. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pages 601–610, 2020.
  60. i3DMM: Deep implicit 3D morphable model of human heads. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12803–12813, 2021.
  61. PhoMoH: Implicit Photorealistic 3D Models of Human Heads. arXiv preprint arXiv:2212.07275, 2022.
  62. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  63. Differentiable augmentation for data-efficient gan training. Advances in neural information processing systems, 33:7559–7570, 2020.
  64. I M Avatar: Implicit morphable head avatars from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13545–13555, 2022.
  65. PointAvatar: Deformable point-based head avatars from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21057–21067, 2023.
  66. Instant volumetric head avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4574–4584, 2023.
  67. Jinlong Yang Michael J.Black Otmar Hilliges Andreas Geiger Zijian Dong, Xu Chen. AG3D: Learning to generate 3D avatars from 2D image collections. In International Conference on Computer Vision (ICCV), 2023.
Citations (1)

Summary

  • The paper presents a two-stage approach that first aligns a parametric head template with 3D scans and then refines surface displacements via UV maps.
  • The method employs a StyleGAN-based generative model to learn intricate head variations, including complex hairstyles and subtle facial details.
  • Quantitative and visual evaluations demonstrate the model's improved realism and adaptability for animation in various applications.

Introduction to Generative Head Modeling

The development of advanced modeling techniques has significantly progressed the creation of realistic 3D human head models. These models are not only important in the field of entertainment and virtual reality but are increasingly utilized in other industries such as medical simulations and digital communication. A crucial challenge here is generating models that are not only high in detail but can also be easily manipulated for animation and tracking while preserving detail.

Methodology

Crafting Detailed Displacements

A novel approach to create detailed 3D human head models involves two main stages. Initially, a parametric head template, known for its ability to be animated and represent basic shape variations, is aligned to a comprehensive database of 3D head scans. This process involves fitting a detailed mesh to each scanned head, allowing for free movement of vertices to capture subtle surface discrepancies. In the first phase of fitting, these deformations are regulated to avoid mesh self-intersections. Then, a second phase focuses on refining displacement along the head's surface normals.

To capture and reproduce the intricate variations observed in human heads, these displacements are translated into 2D maps that a generative model can learn from. This generative model, based on the StyleGAN architecture, is then trained on these UV maps of displacements, effectively encoding the rich details into a format that can be broadly applied.

Versatility and Adaptability

Learning from 2D UV displacement maps allows this method to achieve a high-resolution output that extends beyond the fundamental geometry of the parametric model. It also introduces considerable shape variations to the model, including complex hairstyles. This provides an unprecedented level of detail and variability when generating new head models or adapting existing ones to different shapes.

Evaluation and Applications

To showcase the effectiveness and practicality of the generated models, various evaluations are conducted. The diversity and fidelity of the models are compared with existing methods and measured against real human 3D scans. These comparisons are quantified using several metrics and are visually inspected regarding both the UV map space and the rendered image space.

The applications demonstrate the model's capacity to generate 3D heads unconditionally and fit them to complete or partial observations, such as point clouds obtained from depth sensors. Particularly noteworthy is the model's ability to animate and manipulate the 3D heads, thanks to the integration with the parametric template that allows for explicit expression and motion adjustments.

Conclusions

The two-stage registration procedure crafted to align the parametric model with scanned data results in detailed displacement maps that significantly enhance the model's realism and variability. By employing a StyleGAN architecture to generalize over these high-resolution maps, researchers have established a method capable of generating detailed and animatable 3D head models. The model's success in quantitatively and visually achieving high levels of detail and diversity highlights its utility across various settings and applications.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com