Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video (2407.15212v2)

Published 21 Jul 2024 in cs.CV and cs.GR

Abstract: Efficient and accurate reconstruction of a relightable, dynamic clothed human avatar from a monocular video is crucial for the entertainment industry. This paper introduces the Surfel-based Gaussian Inverse Avatar (SGIA) method, which introduces efficient training and rendering for relightable dynamic human reconstruction. SGIA advances previous Gaussian Avatar methods by comprehensively modeling Physically-Based Rendering (PBR) properties for clothed human avatars, allowing for the manipulation of avatars into novel poses under diverse lighting conditions. Specifically, our approach integrates pre-integration and image-based lighting for fast light calculations that surpass the performance of existing implicit-based techniques. To address challenges related to material lighting disentanglement and accurate geometry reconstruction, we propose an innovative occlusion approximation strategy and a progressive training approach. Extensive experiments demonstrate that SGIA not only achieves highly accurate physical properties but also significantly enhances the realistic relighting of dynamic human avatars, providing a substantial speed advantage. We exhibit more results in our project page: https://GS-IA.github.io.

Citations (2)

View on Semantic Scholar

Summary

The paper presents the SGIA framework that employs a novel 2D Gaussian surfel representation with full PBR attributes for dynamic and accurate human reconstruction.
It utilizes pre-integrated lighting techniques to separate diffuse and specular components, enabling rapid computation and outperforming traditional Monte-Carlo methods.
Experimental results demonstrate significant improvements with up to 5x faster training and 100x faster rendering, achieving superior PSNR, SSIM, and LPIPS scores compared to state-of-the-art methods.

An Overview of Surfel-based Gaussian Inverse Rendering for Dynamic Human Reconstruction from Monocular Videos

The paper, "Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Videos" by Yiqun Zhao et al., introduces a novel methodology named the Surfel-based Gaussian Inverse Avatar (SGIA). This approach significantly advances the field of efficient reconstruction and relighting of dynamic human avatars with Physically-Based Rendering (PBR) properties from monocular video inputs. The approach offers considerable improvements over existing methods, particularly in terms of speed and accuracy of rendering under diverse lighting conditions.

Methodology and Innovations

Canonical PBR-aware 2DGS Representation

The SGIA framework employs a representation based on 2D Gaussians (2DGS) embedded in 3D space, which are referred to as surfels. These surfels include a comprehensive set of PBR attributes: position, rotation, scale, opacity, appearance, albedo, roughness, and metallic properties. This detailed representation allows for more precise and physically accurate rendering of human avatars under various lighting conditions.

Linear Blend Skinning (LBS)

SGIA's body deformation is managed via Linear Blend Skinning (LBS) applied to the canonical 2DGS. The canonical Gaussians are first initialized based on a Skinned Multi-Person Linear Model (SMPL). Each Gaussian's parameters are then transformed to the observation space using LBS, allowing for realistic animation and adaptation to different poses.

Physically-Based Rendering with Pre-Integrated Lighting

SGIA introduces a technique leveraging image-based lighting and pre-integration to swiftly calculate the light interactions, surpassing the traditional Monte-Carlo methods. Specifically, this involves separating the rendering equation into diffuse and specular components and pre-integrating these terms to facilitate rapid querying during rendering.

Occlusion Approximation and Efficiency

One of the challenges in dynamic scenes is efficiently approximating occlusion to handle shadow effects accurately. SGIA employs a hybrid approach wherein ambient occlusion is pre-computed using a low-resolution template mesh. This strategy effectively balances computational efficiency and accuracy by combining ray casting with mesh-based ambient occlusion, leveraging the small difference between occlusion effects on a clothed human versus an unclothed template.

Progressive Training Strategy

For training, SGIA utilizes a progressive two-stage strategy. Initially, the model focuses on reconstructing the avatar's rough shape from the monocular video using image reconstruction losses. In the subsequent stage, the model refines PBR attributes and geometry, leveraging a progressive normal alignment strategy to ensure that both the splat normals and the actual mesh geometry are accurately aligned. This progressive training approach ensures high fidelity in the final model's surface reconstruction and PBR properties estimation.

Experimental Results

Synthetic and Real-world Datasets

The paper demonstrates SGIA's performance on various datasets, including the synthetic RANA dataset and real-world datasets such as PeopleSnapshot and ZJU-MoCap. The experiments showcase SGIA's capability to reconstruct detailed and relightable human avatars efficiently, with quantitative metrics indicating significant improvements in PSNR, SSIM, and LPIPS scores compared to state-of-the-art methods like IntrinsicAvatar and Relighting4D (R4D). Specifically, SGIA achieves a notable reduction in normal error and an increase in albedo estimation accuracy, outperforming baselines by a substantial margin.

Speed and Efficiency

A key highlight of SGIA is its training speed and rendering efficiency. The method achieves training times approximately five times faster than leading methods and boasts rendering speeds up to 100 times faster. This efficiency does not come at the cost of quality, as SGIA maintains high accuracy in geometric reconstruction and PBR property estimation.

Implications and Future Directions

The implications of SGIA for fields such as virtual reality, gaming, and films are substantial. The ability to quickly and accurately reconstruct relightable dynamic human avatars from monocular videos opens up opportunities for more interactive and lifelike virtual environments. Moreover, the methodological advancements of SGIA, particularly its use of 2DGS and pre-integrated lighting, set a new standard for efficiency in inverse rendering tasks.

Future research could explore the integration of facial expression capture to enhance the realism and expressiveness of generated avatars further. Additionally, combining the expressive Gaussian Avatar representation with SGIA's full PBR pipeline could yield more expressive and physically accurate virtual humans.

In summary, the SGIA framework marks a significant advancement in the efficient and accurate reconstruction of relightable dynamic human avatars from monocular inputs. It provides palpable improvements in both computational efficiency and the fidelity of the resulting avatars, paving the way for enhanced applications in various digital and interactive media.

PDF Markdown

Related Papers

Tweets

https://twitter.com/janusch_patas/status/1816160904330809849