- The paper introduces Dual Gaussian Splatting, a technique that uses dual-layered joint and skin Gaussians to model global motion and fine appearance details.
- It employs a sequential coarse-to-fine optimization strategy that refines volumetric representations and achieves up to 120-fold compression.
- Experimental results demonstrate superior PSNR, SSIM, and LPIPS metrics compared to state-of-the-art methods, enabling high-fidelity VR on low-end devices.
Dual Gaussian Splatting for Real-time Human-centric Volumetric Videos
In their paper "Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos," Jiang et al. present an advanced methodology for high-fidelity, real-time rendering and compression of volumetric videos. This paper addresses critical challenges in the domain of 3D and 4D content, specifically focusing on human performances within volumetric video. The novel approach, dubbed Dual Gaussian Splatting (DualGS), distinguishes itself through a compressed, high-quality spatio-temporal representation that enables immersive virtual reality (VR) experiences on low-end devices.
Introduction
The primary innovation in this paper is the DualGS representation, which uses a dual-layered system of Gaussians to independently model motion and appearance attributes. Traditional volumetric video production workflows depend heavily on mesh sequences and often require extensive manual intervention to stabilize these sequences, generating large asset sizes that inhibit broader adoption. DualGS eliminates these inefficiencies by representing motion through joint Gaussians and appearance through skin Gaussians.
Methodology
Dual-Gaussian Representation:
DualGS achieves efficient and accurate human performance tracking by initializing two distinct sets of Gaussians:
- Joint Gaussians: A compact number of Gaussians (~15,000) that capture global motion.
- Skin Gaussians: A larger set of Gaussians (~180,000) that represent visual details.
During initialization, joint Gaussians are first optimized to capture the performance’s global motion, with constraints applied to prevent overly skinny Gaussians and oversized structures. Each skin Gaussian is then anchored to multiple joint Gaussians through k-nearest neighbors (KNN), enabling spatial interpolation for motion representation while maintaining temporal coherence. This hierarchical structure substantially reduces motion redundancy and enhances tracking robustness.
Sequential Optimization:
The methodology employs a coarse-to-fine optimization strategy across frames, divided into:
- Coarse Alignment: Focuses solely on joint Gaussians’ motion using a locally rigid regularizer and velocity prediction for robust tracking.
- Fine-grained Optimization: Updates both joint and skin Gaussian attributes. Here, skin Gaussian positions and rotations are interpolated from joint Gaussians to balance rendering quality and temporal consistency. A temporal regularization term further mitigates abrupt changes in Gaussian attributes across frames.
Compression Strategy:
DualGS aims to make the high-fidelity 4D assets viable for integration into low-end devices. The proposed compression framework achieves compression ratios up to 120-fold, effectively encoding ~350KB per frame. Key elements of this strategy include:
- Residual Vector Quantization (RVQ): Applied to joint Gaussians’ motion.
- Codec Compression: Utilized for skin Gaussians’ opacity and scaling, arranged into 2D look-up tables (LUT).
- Persistent Codebook Compression: Handles spherical harmonic (SH) color attributes, greatly reducing storage requirements by clustering SH components and encoding them as persistent indices with length encoding.
Results and Evaluation
The DualGS framework is validated through rigorous qualitative and quantitative comparisons against state-of-the-art dynamic rendering methods such as HumanRF, NeuS2, Spacetime Gaussian, and HiFi4G. The results indicate the superiority of DualGS in terms of rendering quality while maintaining minimal storage overhead. Specifically, DualGS consistently delivers higher PSNR, SSIM, and VMAF scores, with lower LPIPS values.
A comprehensive analysis of ablation studies further demonstrates the efficacy of the DualGS representation. Components such as velocity prediction, joint Gaussians, and coarse-to-fine optimization contribute significantly to the accurate rendering of complex human performances.
Practical Implementation and Implications
The dual Gaussian-based compression strategy makes real-time, high-fidelity VR rendering feasible even on mobile devices like smartphones and standalone VR headsets. The implementation of a Unity plugin and a DualGS player ensures seamless integration into conventional 3D rendering pipelines, facilitating the immersive experience.
Conclusion
Jiang et al.’s work offers a notable advancement in volumetric video rendering and compression. By introducing a dual Gaussian layer representation, this research significantly enhances both the fidelity and efficiency of rendering human performances. Future developments may explore more dynamic optimization strategies to further improve temporal coherence and accommodate topological changes, as well as integrate multi-modal inputs to drive animations.
References
The paper's reference list includes seminal works on dynamic human modeling, neural human representation, and volumetric video compression, reflecting the breadth and depth of research in this field. The authors acknowledge contributions from neural radiance fields, dynamic Gaussian splatting, and adaptive mesh compression, all of which underpin the innovations presented in this paper.