Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting (2403.05087v1)

Published 8 Mar 2024 in cs.GR and cs.CV

Abstract: We present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars with Gaussian Splatting embedded on a triangle mesh, which renders over 300 FPS on a modern GPU and 30 FPS on a mobile device. We disentangle the motion and appearance of a virtual human with explicit mesh geometry and implicit appearance modeling with Gaussian Splatting. The Gaussians are defined by barycentric coordinates and displacement on a triangle mesh as Phong surfaces. We extend lifted optimization to simultaneously optimize the parameters of the Gaussians while walking on the triangle mesh. SplattingAvatar is a hybrid representation of virtual humans where the mesh represents low-frequency motion and surface deformation, while the Gaussians take over the high-frequency geometry and detailed appearance. Unlike existing deformation methods that rely on an MLP-based linear blend skinning (LBS) field for motion, we control the rotation and translation of the Gaussians directly by mesh, which empowers its compatibility with various animation techniques, e.g., skeletal animation, blend shapes, and mesh editing. Trainable from monocular videos for both full-body and head avatars, SplattingAvatar shows state-of-the-art rendering quality across multiple datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Mixamo. https://www.mixamo.com/. Accessed: November 10, 2023.
  2. Detailed Human Avatars from Monocular Video. In International Conference on 3D Vision, pages 98–109, 2018.
  3. Learning Personalized High Quality Volumetric Head Avatars From Monocular RGB Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16890–16900, 2023.
  4. FLARE: Fast learning of animatable and relightable mesh avatars. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), page 15, 2023.
  5. Realistic Human Face Rendering for "The Matrix Reloaded". In ACM SIGGRAPH 2005 Courses, page 13–es, New York, NY, USA, 2005. Association for Computing Machinery.
  6. REALY: Rethinking the Evaluation of 3D Face Reconstruction. In European Conference on Computer Vision, pages 74–92. Springer, 2022.
  7. Animatable Neural Radiance Fields from Monocular RGB Videos, 2021a.
  8. SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 11594–11604, 2021b.
  9. Fast-SNARF: A Fast Deformer for Articulated Neural Fields. Pattern Analysis and Machine Intelligence (PAMI), 2023.
  10. Children and Parents’ Reading of An Augmented Reality Picture Book: Analyses of Behavioral Patterns and Cognitive Attainment. Computers & Education, 72:302–312, 2014.
  11. High-quality streamable free-viewpoint video. ACM Trans. Graph., 34(4), 2015.
  12. Learning an Animatable Detailed 3D Face Model from In-The-Wild Images. ACM Transactions on Graphics (ToG), 40(4):1–13, 2021.
  13. Capturing and Animation of Body and Clothing from Monocular Video. In SIGGRAPH Asia 2022 Conference Papers, 2022.
  14. Learning Disentangled Avatars with Hybrid 3D Representations. arXiv, 2023.
  15. Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8649–8658, 2021.
  16. Neural Head Avatars From Monocular RGB Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18653–18664, 2022.
  17. Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  18. HDHumans: A Hybrid Approach for High-Fidelity Digital Humans. Proc. ACM Comput. Graph. Interact. Tech., 6(3), 2023.
  19. A Mixed-Reality System to Promote Child Engagement in Remote Intergenerational Storytelling. In 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pages 274–279, 2021.
  20. Learning Locally Editable Virtual Humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21024–21035, 2023.
  21. InstantAvatar: Learning Avatars From Monocular Video in 60 Seconds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16922–16932, 2023.
  22. Virtual Tour: An Immersive Low Cost Telepresence System. In 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pages 504–506, 2020.
  23. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics, 42(4), 2023.
  24. VMirror: Enhancing the Interaction with Occluded or Distant Objects in VR with Virtual Mirrors. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, New York, NY, USA, 2021. Association for Computing Machinery.
  25. Learning a Model of Facial Shape and Expression from 4D Scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):194:1–194:17, 2017.
  26. Neural Actor: Neural Free-View Synthesis of Human Actors with Pose Control. ACM Trans. Graph., 40(6), 2021.
  27. Mixture of Volumetric Primitives for Efficient Neural Rendering. ACM Trans. Graph., 40(4), 2021.
  28. SMPL: A Skinned Multi-Person Linear Model. ACM Trans. Graph., 34(6), 2015.
  29. Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis. arXiv, 2023.
  30. Learning to Dress 3D People in Generative Clothing. In Computer Vision and Pattern Recognition (CVPR), 2020.
  31. Pixel Codec Avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 64–73, 2021.
  32. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I, page 405–421, Berlin, Heidelberg, 2020. Springer-Verlag.
  33. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019.
  34. Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies. In ICCV, 2021a.
  35. Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans. In CVPR, 2021b.
  36. SMPLpix: Neural Avatars from 3D Human Models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1810–1819, 2021.
  37. Shanchuan Lin and Linjie Yang and Imran Saleemi and Soumyadip Sengupta. Robust high-resolution video matting with temporal guidance. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3132–3141, 2021.
  38. The Phong Surface: Efficient 3D Model Fitting Using Lifted Optimization. In Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I, page 687–703, Berlin, Heidelberg, 2020. Springer-Verlag.
  39. X-Avatar: Expressive Human Avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16911–16921, 2023.
  40. User-Specific Hand Modeling from Monocular Depth Sequences. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 644–651, 2014.
  41. Efficient and Precise Interactive Hand Tracking through Joint, Continuous Optimization of Pose and Correspondences. ACM Trans. Graph., 35(4), 2016.
  42. Neuralhdhair: Automatic High-fidelity Hair Modeling from a Single Image Using Implicit Neural Representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1526–1535, 2022.
  43. Modeling Clothing as a Separate Layer for an Animatable Human Avatar. ACM Trans. Graph., 40(6), 2021.
  44. H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion. In Advances in Neural Information Processing Systems, pages 14955–14966. Curran Associates, Inc., 2021.
  45. Generating holistic 3d human motion from speech. In CVPR, 2023.
  46. BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation. Int. J. Comput. Vision, 129(11):3051–3068, 2021.
  47. MonoHuman: Animatable Human Neural Field from Monocular Video. CVPR, 2023.
  48. The Video Game Industry: Formation, Present State, and Future. Routledge, 2012.
  49. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR, 2018.
  50. I M Avatar: Implicit Morphable Head Avatars From Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13545–13555, 2022.
  51. PointAvatar: Deformable Point-Based Head Avatars From Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21057–21067, 2023a.
  52. AvatarRex: Real-time Expressive Full-body Avatars. ACM Transactions on Graphics (TOG), 42(4), 2023b.
  53. Instant Volumetric Head Avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4574–4584, 2023.
  54. Surface Splatting. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, page 371–378, New York, NY, USA, 2001. Association for Computing Machinery.
Citations (46)

Summary

  • The paper presents a novel hybrid representation that integrates mesh-based geometry with Gaussian splatting for real-time photorealistic avatar rendering.
  • It employs a dual-layer approach by disentangling motion from appearance to achieve efficient and detailed reconstruction.
  • Empirical evaluations show the method attains over 300 FPS on high-end GPUs and 30 FPS on mobile, highlighting its practical real-time applications.

Overview of SplattingAvatar: Realistic Real-Time Human Avatars

The paper presents SplattingAvatar, a novel method for creating photorealistic human avatars using a hybrid 3D representation that involves Gaussian Splatting embedded on a triangle mesh. This approach enables real-time rendering capabilities on both high-performance GPUs and mobile devices, achieving over 300 FPS on an NVIDIA RTX 3090 and 30 FPS on an iPhone 13. By effectively disentangling motion and appearance, SplattingAvatar maintains fidelity and efficiency without the computational overhead often associated with 3D modeling complexities.

Approach and Methodology

SplattingAvatar leverages a dual-layer representation: the triangle mesh captures the low-frequency motion and surface deformation, while Gaussian splats model high-frequency geometry and appearance. This separation allows the system to employ explicit mesh-based geometry for motion control and implicit Gaussian splats for detailed rendering. Unlike conventional methods that often depend on MLP-based linear blend skinning (LBS) for motion, SplattingAvatar directly controls Gaussian rotations and translations through mesh deformations. This design lends itself to compatibility with various animation techniques, enhancing adaptability and applicability across different contexts.

A central innovation is the trainable Gaussian embedding, described by barycentric coordinates on the mesh as Phong surfaces. This method ensures Gaussians can walk across the mesh surfaces during optimization, simultaneously optimizing both Gaussian parameters and their embeddings in a lifted optimization framework. The result is a cohesive method that addresses previously rigid connectivity issues associated with mesh vertices.

Key Contributions and Results

  1. Integration of Gaussian Splatting with Meshes: The paper proposes a unified framework that combines Gaussian Splatting with mesh controls, offering an improved method for avatar representation that balances realism with computational efficiency.
  2. Lifted Optimization for Enhanced Reconstruction: SplattingAvatar utilizes lifted optimization for the simultaneous refinement of Gaussian parameters and mesh embeddings, ensuring accuracy in virtual human appearance and motion capture.
  3. Real-Time Rendering Demonstrations: The method supports real-time applications with comprehensive evaluations and a successful Unity implementation. Rendering performance metrics highlight significant enhancements compared to existing state-of-the-art techniques.

Empirical evaluations across multiple datasets demonstrate SplattingAvatar's ability to achieve superior rendering quality. Visual and quantitative analyses show marked improvements in capturing complex geometries, especially in regions demanding high-frequency detail, such as facial features and accessories.

Implications and Future Work

The proposed method addresses several challenges in avatar representation, particularly around balancing detail and efficiency. By decoupling motion and appearance through Gaussian-mesh embedding, SplattingAvatar enhances avatar realism without the need for extensive computational resources. This has profound implications for gaming, extended reality (XR), and real-time telepresence applications.

Future research could focus on expanding the disentangled mesh representations, potentially exploring separate meshes for dynamic and static components such as clothing and hair. Additionally, the exploration of other computational platforms or optimizable environments could further broaden the applicability of this method.

SplattingAvatar represents a robust step forward in real-time avatar rendering, demonstrating the potential for highly detailed and animatable virtual humans within practical resource constraints.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com