EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Multi-view Camera Settings

Published 2 Oct 2024 in cs.CV | (2410.01425v2)

Abstract: Feed-forward based 3D Gaussian Splatting methods have demonstrated exceptional capability in real-time novel view synthesis for human models. However, current approaches are confined to either dense viewpoint configurations or restricted image resolutions. These limitations hinder their flexibility in free-viewpoint rendering across a wide range of camera view angle discrepancies, and also restrict their ability to recover fine-grained human details in real time using commonly available GPUs. To address these challenges, we propose a novel pipeline named EVA-Gaussian for 3D human novel view synthesis across diverse multi-view camera settings. Specifically, we first design an Efficient Cross-View Attention (EVA) module to effectively fuse cross-view information under high resolution inputs and sparse view settings, while minimizing temporal and computational overhead. Additionally, we introduce a feature refinement mechianism to predict the attributes of the 3D Gaussians and assign a feature value to each Gaussian, enabling the correction of artifacts caused by geometric inaccuracies in position estimation and enhancing overall visual fidelity. Experimental results on the THuman2.0 and THumansit datasets showcase the superiority of EVA-Gaussian in rendering quality across diverse camera settings. Project page: https://zhenliuzju.github.io/huyingdong/EVA-Gaussian.

Abstract PDF HTML Upgrade to Chat

Summary

The paper introduces EVA-Gaussian, a novel pipeline combining efficient cross-view attention and Gaussian attribute estimation for real-time 3D human reconstruction.
It employs a recurrent U-Net with anchor loss regularization to correct artifacts, achieving superior PSNR, SSIM, and LPIPS scores on benchmark datasets.
The approach proves effective under sparse and diverse camera setups, paving the way for advancements in AR/VR and real-time communication applications.

EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Camera Settings

The paper "EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Camera Settings" introduces a robust methodology to address the challenges in 3D human reconstruction and novel view synthesis. This research proposes a new pipeline named EVA-Gaussian, which demonstrates significant improvements in rendering quality and efficiency, particularly under sparse camera settings diversely positioned.

Summary

The field of 3D reconstruction and novel view synthesis has seen substantial advancements, yet existing methodologies often rely on dense viewpoint settings or precise templates. These constraints limit the flexibility needed for real-time applications such as AR/VR and holographic communication. The proposed EVA-Gaussian pipeline addresses these limitations by incorporating three key innovations:

Efficient cross-View Attention (EVA) Module: This module enhances the estimation of 3D Gaussian positions from source images. The network utilizes U-Net with a dedicated attention mechanism for accurate multi-view correspondence retrieval, even in sparse camera setups.
Gaussian Attribute Estimation: Once the positions are estimated, the remaining attributes—such as opacity, scales, quaternions, and additional features—are predicted using a shallow U-Net. This combination ensures comprehensive attribute estimation for each 3D Gaussian.
Recurrent Feature Refinement: To correct artifacts caused by initial position estimation errors, a recurrent U-Net refines the rendered images in a recurrent manner. This process, along with an anchor loss function, further improves the visual fidelity of the synthesized views.

Results and Performance

The paper presents experimental results on the THuman2.0 and THumanSit datasets, showcasing superior performance of the EVA-Gaussian method over existing approaches such as GPS-Gaussian and ENeRF. For instance, EVA-Gaussian achieved the highest PSNR and SSIM scores, and the lowest LPIPS scores, while maintaining competitive inference times. Some key numerical outcomes were:

On THuman2.0 with an angle change of 45°, EVA-Gaussian achieved a PSNR of 31.11, outperforming GPS-Gaussian (30.30) and ENeRF (29.62).
On THumansit with a similar setting, EVA-Gaussian reached a PSNR of 29.16 compared to GPS-Gaussian's 28.02 and ENeRF's 27.06.

These results illustrate EVA-Gaussian’s prowess in rendering high-fidelity images even with substantial viewpoint changes (e.g., 90°).

Methodological Contributions

The paper's primary contributions can be outlined as follows:

Pipeline Innovation: EVA-Gaussian integrates multi-view 3D Gaussian position estimation with efficient attention mechanisms, followed by attribute and recurrent feature refinement, ensuring robust and efficient 3D human reconstruction.
Cross-view Attention Mechanism: By embedding an efficient cross-view attention mechanism at various scales within the U-Net architecture, the proposed method effectively handles diverse viewing angles, significantly enhancing depth and Gaussian position estimation accuracy.
Anchor Loss Regularization: Introducing an anchor loss that penalizes inconsistencies in Gaussian attributes and face landmarks ensures higher accuracy in critical regions like human faces, thereby enhancing overall image fidelity.

Practical and Theoretical Implications

Practically, EVA-Gaussian’s ability to reconstruct high-fidelity 3D human models in real-time can revolutionize applications in AR/VR, holographic communications, and real-time teaching. These applications demand swift and accurate reconstructions under variable camera settings, aligning perfectly with EVA-Gaussian’s capabilities.

Theoretically, this work advances the understanding of how real-time 3D reconstruction can be achieved using Gaussian representations and deep learning. The cross-view attention approach combined with recurrent refinement introduces a novel way to enhance multi-view consistency and depth accuracy, setting a new benchmark for future research.

Future Directions

Future research may focus on optimizing the computational efficiency of the attention module to handle even higher resolution images and more extensive input views. Additionally, integrating depth information or overlap area detection techniques could further reduce redundancy and improve overall system performance. Exploring these directions can help in refining and expanding the applicability of the EVA-Gaussian approach across various real-world scenarios.

In summary, EVA-Gaussian represents a significant step forward in the field of 3D Gaussian-based human reconstruction and novel view synthesis. Its methodological innovations and superior performance metrics underline its potential for practical applications and pave the way for further advancements in this research area.

Markdown