DepthSplat: Connecting Gaussian Splatting and Depth (2410.13862v3)

Published 17 Oct 2024 in cs.CV

Abstract: Gaussian splatting and single-view depth estimation are typically studied in isolation. In this paper, we present DepthSplat to connect Gaussian splatting and depth estimation and study their interactions. More specifically, we first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features, leading to high-quality feed-forward 3D Gaussian splatting reconstructions. We also show that Gaussian splatting can serve as an unsupervised pre-training objective for learning powerful depth models from large-scale multi-view posed datasets. We validate the synergy between Gaussian splatting and depth estimation through extensive ablation and cross-task transfer experiments. Our DepthSplat achieves state-of-the-art performance on ScanNet, RealEstate10K and DL3DV datasets in terms of both depth estimation and novel view synthesis, demonstrating the mutual benefits of connecting both tasks. In addition, DepthSplat enables feed-forward reconstruction from 12 input views (512x960 resolutions) in 0.6 seconds.

Citations (6)

View on Semantic Scholar

Summary

The paper introduces a novel method that integrates Gaussian splatting with depth estimation to enhance both 3D reconstruction and novel view synthesis.
It employs a robust integration of pre-trained monocular depth features with multi-view data to maintain depth consistency in challenging occlusions and texture-less regions.
Experimental results on ScanNet and RealEstate10K demonstrate superior performance, reducing Abs Rel to 0.044 and achieving a PSNR of 27.44.

DepthSplat: Integrating Gaussian Splatting and Depth Estimation

The paper "DepthSplat: Connecting Gaussian Splatting and Depth" introduces a novel approach to enhancing depth estimation and novel view synthesis by interconnecting Gaussian splatting with depth prediction tasks. This research presents a significant advancement in leveraging the complementary nature of these two techniques, traditionally studied in isolation, to improve both the quality of 3D reconstructions and depth predictions.

Methodology

DepthSplat introduces a robust multi-view depth model by integrating pre-trained monocular depth features. This integration facilitates the maintenance of depth consistency across views, resolving issues in challenging scenarios such as occlusions and texture-less regions. The predicted depth maps are utilized to define Gaussian centers for 3D reconstruction, employing a differentiable splatting operation for novel view synthesis.

This approach pioneers the use of Gaussian splatting as an unsupervised pre-training method to enhance depth models using large-scale unlabelled datasets. The integration of monocular and multi-view depth features ensures that the technique addresses limitations inherent in either approach when used independently.

Experimental Results

DepthSplat demonstrates superior performance on prominent datasets such as ScanNet, RealEstate10K, and DL3DV, achieving state-of-the-art results in both depth estimation and novel view synthesis.

ScanNet Depth Estimation: The method achieved an Abs Rel of 0.044, outperforming previous methods such as DeepV2D and UniMatch.
RealEstate10K Novel View Synthesis: DepthSplat reached a PSNR of 27.44, surpassing models like pixelSplat and MVSplat.

These results underscore the efficacy of synergistically combining Gaussian splatting and depth estimation tasks.

Implications and Future Directions

The paper's findings suggest several implications and potential future developments:

Theoretical Advancements: By demonstrating a robust interaction between Gaussian splatting and depth estimation, this research suggests new avenues for theoretical exploration in 3D reconstruction and photometric consistency.
Practical Applications: The integration can significantly benefit applications in augmented reality, autonomous vehicles, and robotics by providing more accurate and consistent 3D models and depth predictions.
Unsupervised Learning: The novel use of Gaussian splatting as an unsupervised pre-training method highlights the potential of leveraging large unlabelled datasets, addressing a critical bottleneck in data scarcity for supervised learning approaches.

Future research can explore removing the reliance on camera pose inputs, thereby broadening the applicability of this technique in situations where pose information is unreliable or unavailable. Additionally, exploring the scalability of this model to more complex scenes and higher resolutions could further expand its practical use cases.

In conclusion, DepthSplat represents a noteworthy contribution to computer vision, with its innovative approach to connecting depth estimation and Gaussian splatting tasks not only improving performance on benchmark tasks but also paving the way for unsupervised methodologies in depth prediction.

PDF Markdown

Related Papers

GitHub

Tweets

https://twitter.com/janusch_patas/status/1847162832535671015

https://twitter.com/zhenjun_zhao/status/1847120022281638294

https://twitter.com/ducha_aiki/status/1849767442806944030