SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting (2403.05087v1)
Abstract: We present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars with Gaussian Splatting embedded on a triangle mesh, which renders over 300 FPS on a modern GPU and 30 FPS on a mobile device. We disentangle the motion and appearance of a virtual human with explicit mesh geometry and implicit appearance modeling with Gaussian Splatting. The Gaussians are defined by barycentric coordinates and displacement on a triangle mesh as Phong surfaces. We extend lifted optimization to simultaneously optimize the parameters of the Gaussians while walking on the triangle mesh. SplattingAvatar is a hybrid representation of virtual humans where the mesh represents low-frequency motion and surface deformation, while the Gaussians take over the high-frequency geometry and detailed appearance. Unlike existing deformation methods that rely on an MLP-based linear blend skinning (LBS) field for motion, we control the rotation and translation of the Gaussians directly by mesh, which empowers its compatibility with various animation techniques, e.g., skeletal animation, blend shapes, and mesh editing. Trainable from monocular videos for both full-body and head avatars, SplattingAvatar shows state-of-the-art rendering quality across multiple datasets.
- Mixamo. https://www.mixamo.com/. Accessed: November 10, 2023.
- Detailed Human Avatars from Monocular Video. In International Conference on 3D Vision, pages 98–109, 2018.
- Learning Personalized High Quality Volumetric Head Avatars From Monocular RGB Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16890–16900, 2023.
- FLARE: Fast learning of animatable and relightable mesh avatars. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), page 15, 2023.
- Realistic Human Face Rendering for "The Matrix Reloaded". In ACM SIGGRAPH 2005 Courses, page 13–es, New York, NY, USA, 2005. Association for Computing Machinery.
- REALY: Rethinking the Evaluation of 3D Face Reconstruction. In European Conference on Computer Vision, pages 74–92. Springer, 2022.
- Animatable Neural Radiance Fields from Monocular RGB Videos, 2021a.
- SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 11594–11604, 2021b.
- Fast-SNARF: A Fast Deformer for Articulated Neural Fields. Pattern Analysis and Machine Intelligence (PAMI), 2023.
- Children and Parents’ Reading of An Augmented Reality Picture Book: Analyses of Behavioral Patterns and Cognitive Attainment. Computers & Education, 72:302–312, 2014.
- High-quality streamable free-viewpoint video. ACM Trans. Graph., 34(4), 2015.
- Learning an Animatable Detailed 3D Face Model from In-The-Wild Images. ACM Transactions on Graphics (ToG), 40(4):1–13, 2021.
- Capturing and Animation of Body and Clothing from Monocular Video. In SIGGRAPH Asia 2022 Conference Papers, 2022.
- Learning Disentangled Avatars with Hybrid 3D Representations. arXiv, 2023.
- Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8649–8658, 2021.
- Neural Head Avatars From Monocular RGB Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18653–18664, 2022.
- Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- HDHumans: A Hybrid Approach for High-Fidelity Digital Humans. Proc. ACM Comput. Graph. Interact. Tech., 6(3), 2023.
- A Mixed-Reality System to Promote Child Engagement in Remote Intergenerational Storytelling. In 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pages 274–279, 2021.
- Learning Locally Editable Virtual Humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21024–21035, 2023.
- InstantAvatar: Learning Avatars From Monocular Video in 60 Seconds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16922–16932, 2023.
- Virtual Tour: An Immersive Low Cost Telepresence System. In 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pages 504–506, 2020.
- 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics, 42(4), 2023.
- VMirror: Enhancing the Interaction with Occluded or Distant Objects in VR with Virtual Mirrors. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, New York, NY, USA, 2021. Association for Computing Machinery.
- Learning a Model of Facial Shape and Expression from 4D Scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):194:1–194:17, 2017.
- Neural Actor: Neural Free-View Synthesis of Human Actors with Pose Control. ACM Trans. Graph., 40(6), 2021.
- Mixture of Volumetric Primitives for Efficient Neural Rendering. ACM Trans. Graph., 40(4), 2021.
- SMPL: A Skinned Multi-Person Linear Model. ACM Trans. Graph., 34(6), 2015.
- Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis. arXiv, 2023.
- Learning to Dress 3D People in Generative Clothing. In Computer Vision and Pattern Recognition (CVPR), 2020.
- Pixel Codec Avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 64–73, 2021.
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I, page 405–421, Berlin, Heidelberg, 2020. Springer-Verlag.
- Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019.
- Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies. In ICCV, 2021a.
- Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans. In CVPR, 2021b.
- SMPLpix: Neural Avatars from 3D Human Models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1810–1819, 2021.
- Shanchuan Lin and Linjie Yang and Imran Saleemi and Soumyadip Sengupta. Robust high-resolution video matting with temporal guidance. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3132–3141, 2021.
- The Phong Surface: Efficient 3D Model Fitting Using Lifted Optimization. In Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I, page 687–703, Berlin, Heidelberg, 2020. Springer-Verlag.
- X-Avatar: Expressive Human Avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16911–16921, 2023.
- User-Specific Hand Modeling from Monocular Depth Sequences. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 644–651, 2014.
- Efficient and Precise Interactive Hand Tracking through Joint, Continuous Optimization of Pose and Correspondences. ACM Trans. Graph., 35(4), 2016.
- Neuralhdhair: Automatic High-fidelity Hair Modeling from a Single Image Using Implicit Neural Representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1526–1535, 2022.
- Modeling Clothing as a Separate Layer for an Animatable Human Avatar. ACM Trans. Graph., 40(6), 2021.
- H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion. In Advances in Neural Information Processing Systems, pages 14955–14966. Curran Associates, Inc., 2021.
- Generating holistic 3d human motion from speech. In CVPR, 2023.
- BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation. Int. J. Comput. Vision, 129(11):3051–3068, 2021.
- MonoHuman: Animatable Human Neural Field from Monocular Video. CVPR, 2023.
- The Video Game Industry: Formation, Present State, and Future. Routledge, 2012.
- The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR, 2018.
- I M Avatar: Implicit Morphable Head Avatars From Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13545–13555, 2022.
- PointAvatar: Deformable Point-Based Head Avatars From Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21057–21067, 2023a.
- AvatarRex: Real-time Expressive Full-body Avatars. ACM Transactions on Graphics (TOG), 42(4), 2023b.
- Instant Volumetric Head Avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4574–4584, 2023.
- Surface Splatting. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, page 371–378, New York, NY, USA, 2001. Association for Computing Machinery.