V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians (2409.13648v2)

Published 20 Sep 2024 in cs.CV and cs.GR

Abstract: Experiencing high-fidelity volumetric video as seamlessly as 2D videos is a long-held dream. However, current dynamic 3DGS methods, despite their high rendering quality, face challenges in streaming on mobile devices due to computational and bandwidth constraints. In this paper, we introduce V³ (Viewing Volumetric Videos), a novel approach that enables high-quality mobile rendering through the streaming of dynamic Gaussians. Our key innovation is to view dynamic 3DGS as 2D videos, facilitating the use of hardware video codecs. Additionally, we propose a two-stage training strategy to reduce storage requirements with rapid training speed. The first stage employs hash encoding and shallow MLP to learn motion, then reduces the number of Gaussians through pruning to meet the streaming requirements, while the second stage fine tunes other Gaussian attributes using residual entropy loss and temporal loss to improve temporal continuity. This strategy, which disentangles motion and appearance, maintains high rendering quality with compact storage requirements. Meanwhile, we designed a multi-platform player to decode and render 2D Gaussian videos. Extensive experiments demonstrate the effectiveness of V^3, outperforming other methods by enabling high-quality rendering and streaming on common devices, which is unseen before. As the first to stream dynamic Gaussians on mobile devices, our companion player offers users an unprecedented volumetric video experience, including smooth scrolling and instant sharing. Our project page with source code is available at https://authoritywang.github.io/v3/.

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a compact 2D Gaussian video representation that reduces storage and enables real-time volumetric video streaming on mobile devices.
It employs a two-stage training strategy with hash encoding and fine-tuning using residual entropy and temporal losses for efficient motion estimation and compression.
Experimental results demonstrate superior rendering quality, high FPS performance, and reduced storage requirements compared to existing mobile volumetric video methods.

V³: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians

The paper "V³: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians" addresses a significant challenge in the field of mobile volumetric video streaming and rendering. Traditional volumetric video methods often struggle with high computational and storage requirements, rendering their use on mobile devices impractical. This work proposes an innovative approach to overcome these issues by utilizing 2D dynamic Gaussians for efficient mobile rendering.

Key Contributions

The paper introduces several important contributions:

Compact 2D Gaussian Video Representation: The primary innovation lies in representing the attributes of dynamic 3D Gaussian Splatting (3DGS) as multiple 2D Gaussian videos, allowing hardware video codecs to handle the streaming efficiently. This 2D representation significantly reduces storage requirements and facilitates real-time rendering on mobile platforms.
Two-Stage Training Strategy: The authors propose a two-stage training strategy to improve the efficiency of generating these compact representations. The first stage involves motion estimation using hash encoding and shallow MLP, followed by pruning and fine-tuning stages to ensure temporal continuity and reduce storage costs.
Temporal Regularization: To maintain high temporal consistency in the 2D Gaussian videos, the authors introduce a residual entropy loss and a temporal loss, which help reduce the entropy of Gaussian attributes and enhance the robustness to quantization.
Multi-Platform Compatibility: A companion V³ player is developed to decode and render the 2D Gaussian videos on various mobile platforms, demonstrating real-time streaming and rendering capabilities.

Methodology

The methodology centers around the transformation of 3DGS sequences into 2D Gaussian videos. The process starts with keyframe reconstruction using static 3DGS and proceeds with a motion estimation phase using a hash-encoded shallow MLP. Subsequent frames are fine-tuned using residual entropy and temporal losses to maintain consistency and reduce storage space between frames.

Keyframe Reconstruction: The initial keyframe is reconstructed using a neural mesh extracted from NeuS2. This frame is then optimized for Gaussian attributes and pruned to reduce the number of Gaussians, controlling the model's compactness.

Two-Stage Training: The training is divided into two stages:

Stage One (Motion Estimation): Utilizes a hash grid with a shallow MLP to estimate the motion of Gaussian splats between adjacent frames efficiently.
Stage Two (Fine-Tuning): Focuses on adjusting Gaussian attributes using residual entropy and temporal losses. This ensures the temporal consistency and efficient compression of the resulting 2D Gaussian videos.

Compression and Streaming: The Gaussian attributes are baked into a 2D format using Morton sorting to maintain spatial proximity in 3D space, enhancing the codec's effectiveness. The use of different quantization settings for various Gaussian attributes further improves compression efficiency.

Experimental Results and Performance

The authors evaluated V³ on multiple datasets, including the ReRF and Actors-HQ datasets. The results highlight V³'s ability to achieve superior rendering quality compared to existing methods such as VideoRF, 3DGStream, HumanRF, and NeuS2, with significantly reduced storage requirements.

The comparative studies, summarized in quantitative metrics (PSNR, SSIM, Training Time, and Storage Size), affirm that V³ not only delivers higher quality but also maintains efficient training times and minimal storage footprints. Furthermore, the multi-platform runtime analysis demonstrates that V³ can achieve high FPS rendering performances, confirming its feasibility for mobile platforms.

Implications and Future Directions

The success of V³ in streaming and rendering volumetric video on mobile devices opens numerous practical avenues for real-time applications such as immersive experiences, remote collaboration, and entertainment. The compact and efficient nature of the proposed method holds promise for widespread adoption in mobile applications.

Future Developments:

Optimizing Real-Time Reconstruction: Enhancing real-time generation capabilities could make V³ suitable for live streaming scenarios, expanding its practical use-cases.
Handling Complex Scenes: Extending the approach to handle larger, more complex scenes with multiple objects or extensive human-object interactions could broaden its applicability.
Further Compression Techniques: Exploring additional compression techniques tailored to the specific needs of volumetric data may yield even smaller models without compromising rendering quality.

Conclusion

V³ presents a novel and effective solution for rendering volumetric videos on mobile devices by leveraging streamable 2D dynamic Gaussians. The method's ability to produce high-quality, temporally consistent video streams with minimal storage requirements and efficient training times is a significant advancement in the field of neural rendering and mobile graphics.

Related Papers

GitHub

V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Gridded Gaussians

Tweets

https://twitter.com/_akhaliq/status/1838063419972317425

https://twitter.com/arXivGPT/status/1838692102215020571