FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction (2412.09573v1)

Published 12 Dec 2024 in cs.CV

Abstract: Existing sparse-view reconstruction models heavily rely on accurate known camera poses. However, deriving camera extrinsics and intrinsics from sparse-view images presents significant challenges. In this work, we present FreeSplatter, a highly scalable, feed-forward reconstruction framework capable of generating high-quality 3D Gaussians from uncalibrated sparse-view images and recovering their camera parameters in mere seconds. FreeSplatter is built upon a streamlined transformer architecture, comprising sequential self-attention blocks that facilitate information exchange among multi-view image tokens and decode them into pixel-wise 3D Gaussian primitives. The predicted Gaussian primitives are situated in a unified reference frame, allowing for high-fidelity 3D modeling and instant camera parameter estimation using off-the-shelf solvers. To cater to both object-centric and scene-level reconstruction, we train two model variants of FreeSplatter on extensive datasets. In both scenarios, FreeSplatter outperforms state-of-the-art baselines in terms of reconstruction quality and pose estimation accuracy. Furthermore, we showcase FreeSplatter's potential in enhancing the productivity of downstream applications, such as text/image-to-3D content creation.

Summary

The paper introduces a pose-free Gaussian splatting framework that leverages a transformer architecture to enable real-time 3D reconstruction without calibrated camera poses.
It constructs efficient Gaussian maps from multi-view image tokens, bypassing the need for extensive image overlap or pre-aligned camera parameters.
Experiments demonstrate that FreeSplatter variants achieve significant PSNR gains and robust zero-shot generalization across object-centric and scene-level tasks.

Overview of "FreeSplatter: Pose-Free Gaussian Splatting for Sparse-View 3D Reconstruction"

The paper entitled "FreeSplatter: Pose-Free Gaussian Splatting for Sparse-View 3D Reconstruction" introduces FreeSplatter, an innovative framework for 3D reconstruction that operates in the absence of calibrated camera poses. This approach marks a departure from traditional methods that necessitate accurate extrinsic and intrinsic camera parameters, presenting a scalable, feed-forward pipeline capable of real-time pose estimation.

Methodological Insights

FreeSplatter is constructed around a streamlined transformer architecture. The model takes multi-view image tokens and efficiently transforms them into pixel-wise 3D Gaussian primitives through self-attention mechanisms. By eliminating the need for predefined camera parameters, FreeSplatter addresses the bottlenecks faced by traditional sparse-view reconstruction models, which typically rely on substantial image overlap or pre-aligned camera pose data. A unique aspect of the model lies in its construction of "Gaussian maps," which foster efficient 3D modeling and facilitate instantaneous camera pose predictions using off-the-shelf solvers.

For practical deployment, the paper develops two tailored variants of FreeSplatter targeting different scales of reconstruction: object-centric (FreeSplatter-O) and scene-level (FreeSplatter-S). Both models employ a transformer architecture comprising approximately 306 million parameters, adapted to manage distinct reconstruction scales.

Key Results

FreeSplatter achieves remarkable improvements in reconstruction quality and camera pose estimation accuracy. FreeSplatter-O significantly outperforms pre-existing large reconstruction models that depend on known poses, delivering substantial PSNR gains on novel datasets. For scene-level tasks, FreeSplatter-S demonstrates competitive performance against models like MASt3R, while successfully estimating poses without prior alignment. Notably, FreeSplatter offers robust zero-shot generalization capabilities across various datasets, extending its practical applicability significantly.

Implications and Future Directions

The findings suggest notable implications for 3D content creation and downstream applications such as text-to-3D and image-to-3D generation. By removing the necessity for camera pose information, FreeSplatter simplifies the reconstruction pipeline, potentially enhancing productivity and broadening accessibility for 3D modeling tasks.

Looking ahead, the paper acknowledges certain limitations of the current framework, such as its reliance on depth data during training and the requirement of separate model variants for different reconstruction tasks. A unified model capable of seamlessly handling diverse reconstruction needs represents a promising avenue for future exploration. Additionally, extending training datasets to include scenarios without depth annotations remains a crucial challenge.

The FreeSplatter paper contributes a substantial advancement in the field of 3D reconstruction by challenging the entrenched dependency on camera calibration, thereby paving the way for more versatile and robust reconstruction models. The potential to streamline and transform content creation processes underscores its value to the field and sets a foundation for further exploration and optimization in pose-free 3D modeling.

PDF Markdown

Related Papers

Tweets

https://twitter.com/zhenjun_zhao/status/1867434269255086213