Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes (2404.01543v1)

Published 2 Apr 2024 in cs.CV and cs.GR

Abstract: 3D head avatars built with neural implicit volumetric representations have achieved unprecedented levels of photorealism. However, the computational cost of these methods remains a significant barrier to their widespread adoption, particularly in real-time applications such as virtual reality and teleconferencing. While attempts have been made to develop fast neural rendering approaches for static scenes, these methods cannot be simply employed to support realistic facial expressions, such as in the case of a dynamic facial performance. To address these challenges, we propose a novel fast 3D neural implicit head avatar model that achieves real-time rendering while maintaining fine-grained controllability and high rendering quality. Our key idea lies in the introduction of local hash table blendshapes, which are learned and attached to the vertices of an underlying face parametric model. These per-vertex hash-tables are linearly merged with weights predicted via a CNN, resulting in expression dependent embeddings. Our novel representation enables efficient density and color predictions using a lightweight MLP, which is further accelerated by a hierarchical nearest neighbor search method. Extensive experiments show that our approach runs in real-time while achieving comparable rendering quality to state-of-the-arts and decent results on challenging expressions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. MetaHuman - Unreal Engine. https://www.unrealengine.com/en-US/metahuman. Accessed: 2022-10-17.
  2. VRChat. https://hello.vrchat.com. Accessed: 2023-02-05.
  3. XHolo Virtual Assitant. https://www.digalix.com/en/virtual-assistant-augmented-reality-xholo. Accessed: 2023-03-01.
  4. RigNeRF: Fully Controllable Neural 3D Portraits. pages 20364–20373, 2022.
  5. Learning personalized high quality volumetric head avatars from monocular rgb videos. In CVPR, pages 16890–16900, 2023.
  6. High-Quality Passive Facial Performance Capture Using Anchor Frames. In ACM SIGGRAPH 2011 Papers, pages 1–10. 2011.
  7. Authentic volumetric avatars from a phone scan. ACM TOG, 2022.
  8. Tensorf: Tensorial radiance fields. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII, pages 333–350. Springer, 2022a.
  9. Implicit neural head synthesis via controllable local deformation fields. In CVPR, pages 416–426, 2023.
  10. Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. arXiv preprint arXiv:2208.00277, 2022b.
  11. MeshLab: an Open-Source Mesh Processing Tool. In Eurographics Italian Chapter Conference. The Eurographics Association, 2008.
  12. Head2head++: Deep Facial Attributes Re-Targeting. IEEE Transactions on Biometrics, Behavior, and Identity Science, 3(1):31–43, 2021.
  13. 3D Morphable Face Models—Past, Present, and Future. 39(5):1–38, 2020.
  14. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In CVPR, pages 8649–8658, 2021.
  15. Reconstructing personalized semantic facial nerf models from monocular video. ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia), 41(6), 2022.
  16. The Relightables: Volumetric Performance Capture of Humans With Realistic Relighting. ACM TOG, 2019.
  17. NeuMan: Neural Human Radiance Field From a Single Video. 2022.
  18. 3d gaussian splatting for real-time radiance field rendering. ACM TOG, 42(4):1–14, 2023.
  19. Deep Video Portraits. ACM TOG, 37(4):1–14, 2018.
  20. Head2head: Video-Based Neural Head Synthesis. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pages 16–23. IEEE, IEEE, 2020.
  21. Learning a Model of Facial Shape and Expression From 4D Scans. ACM TOG, 36(6):194:1–194:17, 2017.
  22. Robust high-resolution video matting with temporal guidance. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 238–247, 2022.
  23. Neural sparse voxel fields. Advances in Neural Information Processing Systems, 33:15651–15663, 2020.
  24. Mixture of volumetric primitives for efficient neural rendering. ACM TOG, 40(4):1–13, 2021.
  25. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, 2019.
  26. Deep Relightable Textures: Volumetric Performance Capture With Neural Rendering. ACM TOG, 39(6):1–21, 2020a.
  27. Deep relightable textures - volumetric performance capture with neural rendering. 2020b.
  28. NeRF: Representing Scenes As Neural Radiance Fields for View Synthesis. Communications of the ACM, 65(1):99–106, 2021.
  29. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
  30. Holoportation: Virtual 3D Teleportation in Real-Time. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, 2016.
  31. Nerfies: Deformable Neural Radiance Fields. pages 5865–5874, 2021.
  32. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14335–14345, 2021.
  33. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In CVPR, 2022.
  34. Advances in neural rendering. In Computer Graphics Forum, pages 703–735. Wiley Online Library, 2022.
  35. Zach Waggoner. My Avatar, My Self: Identity in Video Role-Playing Games. McFarland, 2009.
  36. Styleavatar: Real-time photo-realistic portrait avatar from a single video. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–10, 2023.
  37. Humannerf: Free-viewpoint rendering of moving people from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16210–16220, 2022.
  38. Avatarmav: Fast 3d head avatar reconstruction using motion-aware neural voxels. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–10, 2023a.
  39. Latentavatar: Learning latent expression code for expressive neural head avatar. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–10, 2023b.
  40. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021.
  41. The Unreasonable Effectiveness of Deep Features As a Perceptual Metric. pages 586–595, 2018.
  42. Havatar: High-fidelity head avatar via facial model conditioned neural radiance field. ACM TOG, 43(1):1–16, 2023.
  43. Im avatar: Implicit morphable head avatars from videos. In CVPR, pages 13545–13555, 2022.
  44. Pointavatar: Deformable point-based head avatars from videos. In CVPR, pages 21057–21067, 2023.
  45. Instant volumetric head avatars. In CVPR, pages 4574–4584, 2023.
  46. State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications. pages 523–550, 2018.
Citations (1)

Summary

  • The paper presents a novel real-time rendering approach for photorealistic 3D head avatars using mesh-anchored hash table blendshapes.
  • It employs a lightweight MLP and hierarchical k-NN search to efficiently blend vertex-level deformations, achieving over 30 FPS at 512x512 resolution.
  • The method demonstrates superior facial expression control and visual quality, outpacing traditional implicit volumetric techniques for interactive applications.

Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes

Introduction to Mesh-anchored Hash Table Blendshapes

The creation of photorealistic human avatars has seen significant advancements with the adoption of neural implicit volumetric representations. However, the computational demands of existing methods restrict their application in real-time scenarios, such as virtual reality or teleconferencing. This paper introduces a novel approach to constructing 3D neural implicit head avatars which importantly combines real-time rendering capabilities without compromising on the visual fidelity and controllability required for dynamic facial expressions.

The cornerstone of this method is the development of local hash table blendshapes, which are strategically integrated with the vertices of a face parametric model. These blendshapes operate at a vertex level, allowing for more nuanced and localized facial expressions by linearly merging embeddings produced by a convolutional neural network. The adoption of a lightweight Multilayer Perceptron (MLP) alongside a hierarchical nearest neighbor search method forms the basis for efficient density and color predictions, enabling real-time rendering.

Mesh-anchored Hash Table Blendshapes: The Core Representation

The model employs mesh-anchored hash table blendshapes where multiple, smaller hash tables are associated with the vertices of a 3D morphable model (3DMM). This ensures that each vertex's local deformations significantly influence the surrounding area, enhancing the granularity of expressions and overall model expressiveness. The real-time rendering is made possible by the use of a very lightweight MLP, with the acceleration attributed to hierarchical k-nearest-neighbor searches for embedding retrieval.

Hierarchical kk-nearest-neighbor Search

To further expedite the rendering speed, this work introduces a novel hierarchical kk-nearest-neighbor (k-NN) search strategy. By organizing query points into clusters, the method efficiently narrows down the search space for neighbor vertices, which critically contributes to achieving real-time rendering speeds without sacrificing visual quality.

Experimental Validation and Results

The proposed approach consistently outperforms prior methods in rendering speed, achieving over 30 frames per second at a 512x512 resolution in real-time scenarios. This is while maintaining comparable—or in certain cases, superior—visual quality with state-of-the-art high-quality 3D avatars. The experiments underscore the model's aptitude in rendering challenging expressions more accurately than current efficient avatars, establishing it as a significant advancement in the field.

Theoretical and Practical Implications

This model's innovative blend of efficiency, quality, and controllability heralds a new direction for the development of 3D head avatars, particularly for real-time applications. The method elegantly circumvents the computational challenges traditionally associated with neural implicit models, without compromising on the ability to generate dynamic, photorealistic facial expressions. Further exploration into optimizing this approach could lead to broader applications, including more immersive virtual reality experiences and more realistic telepresence in video conferencing.

Conclusion

This paper presents a groundbreaking method for creating high-fidelity, controllable 3D head avatars capable of real-time rendering. The introduction of mesh-anchored hash table blendshapes, combined with a hierarchical k-NN search, represents a significant technological advancement, pushing the boundaries of what is possible in the domain of virtual human representation. Future work will undoubtedly build on this foundation, exploring new realms of efficiency and realism in digital human modeling.

X Twitter Logo Streamline Icon: https://streamlinehq.com