Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs (2408.13912v2)

Published 25 Aug 2024 in cs.CV and cs.LG

Abstract: In this paper, we introduce Splatt3R, a pose-free, feed-forward method for in-the-wild 3D reconstruction and novel view synthesis from stereo pairs. Given uncalibrated natural images, Splatt3R can predict 3D Gaussian Splats without requiring any camera parameters or depth information. For generalizability, we build Splatt3R upon a ``foundation'' 3D geometry reconstruction method, MASt3R, by extending it to deal with both 3D structure and appearance. Specifically, unlike the original MASt3R which reconstructs only 3D point clouds, we predict the additional Gaussian attributes required to construct a Gaussian primitive for each point. Hence, unlike other novel view synthesis methods, Splatt3R is first trained by optimizing the 3D point cloud's geometry loss, and then a novel view synthesis objective. By doing this, we avoid the local minima present in training 3D Gaussian Splats from stereo views. We also propose a novel loss masking strategy that we empirically find is critical for strong performance on extrapolated viewpoints. We train Splatt3R on the ScanNet++ dataset and demonstrate excellent generalisation to uncalibrated, in-the-wild images. Splatt3R can reconstruct scenes at 4FPS at 512 x 512 resolution, and the resultant splats can be rendered in real-time.

Citations (11)

View on Semantic Scholar

Summary

The paper introduces a novel zero-shot method that predicts 3D Gaussian splats from uncalibrated stereo images.
It leverages a feed-forward neural network with an extended Gaussian splatting head and loss masking strategy to enhance reconstruction accuracy.
Experimental results on ScanNet++ show significant improvements in PSNR and SSIM over baselines, demonstrating its robustness in novel view synthesis.

Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs

Introduction

The paper "Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs" introduces a novel method for 3D reconstruction and novel view synthesis from stereo pairs without pre-existing camera parameters or depth information. Building on the MASt3R framework, Splatt3R offers a significant advancement by predicting 3D Gaussian splats to create photorealistic images from minimal input data. This essay will critically analyze the methodology, results, and implications of Splatt3R, exploring its potential to influence future developments in AI-driven 3D modeling and novel view synthesis.

Methodology

Splatt3R leverages a feed-forward neural network to predict 3D Gaussian splats directly from uncalibrated stereo images. Its architecture extends the MASt3R model, which performs accurate 3D point predictions, by adding a Gaussian splat prediction head. This head estimates the necessary Gaussian parameters, including mean positions, covariances (parameterized by rotation and scale), spherical harmonics, opacities, and offsets.

The model capitalizes on the conceptual resemblance between generalizable 3D-GS methods (such as pixelSplat and MVSplat) and MASt3R’s cross-attention networks. To enhance training, the paper introduces a novel loss masking strategy to handle unseen areas in the target views, which ensures the Gaussian primitives are supervised correctly.

Experimental Results

The authors conducted extensive experiments using the ScanNet++ dataset to evaluate Splatt3R's performance. The results demonstrate substantial improvements over existing methods in various scenarios. Specifically, Splatt3R outperforms the MASt3R point cloud, pixelSplat (with both ground truth and estimated camera poses) in terms of PSNR, SSIM, and LPIPS across different baseline distances. For instance, in testing subsets representing different degrees of view overlap, Splatt3R achieved a PSNR ranging from 19.18 to 19.66, significantly surpassing the baseline methods. The method maintains high accuracy and photorealism even when tested on in-the-wild data, proving its robustness and adaptability.

The paper attributes these performance gains to effective modeling of 3D Gaussian primitives and the loss masking approach, which prevents the model from making counterproductive updates based on unseen parts of the scene.

Implications and Future Directions

The implications of Splatt3R are multifaceted. Practically, the ability to synthesize novel views from uncalibrated image pairs without pre-existing depth or camera parameters significantly lowers the barrier to high-quality 3D reconstruction. It democratizes this capability, making intricate 3D modeling accessible even with minimal input data, which is critical for applications in VR/AR, gaming, and digital heritage preservation.

Theoretically, Splatt3R's success highlights the potential of incorporating Gaussian splats within feed-forward models for generalizable novel view synthesis. It opens pathways for integrating other forms of geometric primitives into neural networks, which could result in more efficient and accurate 3D modeling techniques.

In terms of future developments, potential directions include exploring more sophisticated color modeling techniques, such as higher-degree spherical harmonics, or hybrid approaches that combine neural and analytical methods for scene representation. Further, integrating Splatt3R into larger pipelines that handle dynamic scenes or multi-object environments could yield broader applications.

Conclusion

Splatt3R presents a novel, effective solution for 3D reconstruction and novel view synthesis from uncalibrated image pairs. By building upon the MASt3R framework and introducing Gaussian splatting along with a robust loss masking strategy, it significantly advances the field of neural scene representation. The paper's results indicate strong practical and theoretical implications, underscoring its potential to shape future research and applications in AI-driven 3D modeling.

PDF Markdown

Related Papers

Tweets

https://twitter.com/janusch_patas/status/1828289624172630063

https://twitter.com/Almorgand/status/1827966167614378349