- The paper presents the SGIA framework that employs a novel 2D Gaussian surfel representation with full PBR attributes for dynamic and accurate human reconstruction.
- It utilizes pre-integrated lighting techniques to separate diffuse and specular components, enabling rapid computation and outperforming traditional Monte-Carlo methods.
- Experimental results demonstrate significant improvements with up to 5x faster training and 100x faster rendering, achieving superior PSNR, SSIM, and LPIPS scores compared to state-of-the-art methods.
An Overview of Surfel-based Gaussian Inverse Rendering for Dynamic Human Reconstruction from Monocular Videos
The paper, "Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Videos" by Yiqun Zhao et al., introduces a novel methodology named the Surfel-based Gaussian Inverse Avatar (SGIA). This approach significantly advances the field of efficient reconstruction and relighting of dynamic human avatars with Physically-Based Rendering (PBR) properties from monocular video inputs. The approach offers considerable improvements over existing methods, particularly in terms of speed and accuracy of rendering under diverse lighting conditions.
Methodology and Innovations
Canonical PBR-aware 2DGS Representation
The SGIA framework employs a representation based on 2D Gaussians (2DGS) embedded in 3D space, which are referred to as surfels. These surfels include a comprehensive set of PBR attributes: position, rotation, scale, opacity, appearance, albedo, roughness, and metallic properties. This detailed representation allows for more precise and physically accurate rendering of human avatars under various lighting conditions.
Linear Blend Skinning (LBS)
SGIA's body deformation is managed via Linear Blend Skinning (LBS) applied to the canonical 2DGS. The canonical Gaussians are first initialized based on a Skinned Multi-Person Linear Model (SMPL). Each Gaussian's parameters are then transformed to the observation space using LBS, allowing for realistic animation and adaptation to different poses.
Physically-Based Rendering with Pre-Integrated Lighting
SGIA introduces a technique leveraging image-based lighting and pre-integration to swiftly calculate the light interactions, surpassing the traditional Monte-Carlo methods. Specifically, this involves separating the rendering equation into diffuse and specular components and pre-integrating these terms to facilitate rapid querying during rendering.
Occlusion Approximation and Efficiency
One of the challenges in dynamic scenes is efficiently approximating occlusion to handle shadow effects accurately. SGIA employs a hybrid approach wherein ambient occlusion is pre-computed using a low-resolution template mesh. This strategy effectively balances computational efficiency and accuracy by combining ray casting with mesh-based ambient occlusion, leveraging the small difference between occlusion effects on a clothed human versus an unclothed template.
Progressive Training Strategy
For training, SGIA utilizes a progressive two-stage strategy. Initially, the model focuses on reconstructing the avatar's rough shape from the monocular video using image reconstruction losses. In the subsequent stage, the model refines PBR attributes and geometry, leveraging a progressive normal alignment strategy to ensure that both the splat normals and the actual mesh geometry are accurately aligned. This progressive training approach ensures high fidelity in the final model's surface reconstruction and PBR properties estimation.
Experimental Results
Synthetic and Real-world Datasets
The paper demonstrates SGIA's performance on various datasets, including the synthetic RANA dataset and real-world datasets such as PeopleSnapshot and ZJU-MoCap. The experiments showcase SGIA's capability to reconstruct detailed and relightable human avatars efficiently, with quantitative metrics indicating significant improvements in PSNR, SSIM, and LPIPS scores compared to state-of-the-art methods like IntrinsicAvatar and Relighting4D (R4D). Specifically, SGIA achieves a notable reduction in normal error and an increase in albedo estimation accuracy, outperforming baselines by a substantial margin.
Speed and Efficiency
A key highlight of SGIA is its training speed and rendering efficiency. The method achieves training times approximately five times faster than leading methods and boasts rendering speeds up to 100 times faster. This efficiency does not come at the cost of quality, as SGIA maintains high accuracy in geometric reconstruction and PBR property estimation.
Implications and Future Directions
The implications of SGIA for fields such as virtual reality, gaming, and films are substantial. The ability to quickly and accurately reconstruct relightable dynamic human avatars from monocular videos opens up opportunities for more interactive and lifelike virtual environments. Moreover, the methodological advancements of SGIA, particularly its use of 2DGS and pre-integrated lighting, set a new standard for efficiency in inverse rendering tasks.
Future research could explore the integration of facial expression capture to enhance the realism and expressiveness of generated avatars further. Additionally, combining the expressive Gaussian Avatar representation with SGIA's full PBR pipeline could yield more expressive and physically accurate virtual humans.
In summary, the SGIA framework marks a significant advancement in the efficient and accurate reconstruction of relightable dynamic human avatars from monocular inputs. It provides palpable improvements in both computational efficiency and the fidelity of the resulting avatars, paving the way for enhanced applications in various digital and interactive media.