Single-model generalization across view counts, camera pose availability, and calibration

Develop a single feedforward 3D scene reconstruction model that generalizes across diverse settings, specifically handling an arbitrary number of input views and operating effectively with both posed and unposed cameras as well as with both calibrated and uncalibrated intrinsics, while producing a unified 3D Gaussian Splatting representation.

Background

Feedforward Gaussian Splatting methods aim to reconstruct 3D scenes without per-scene optimization, but existing approaches typically assume accurate camera poses, known intrinsics, or a fixed small number of views. Real-world scenarios rarely meet all these assumptions simultaneously, requiring a model that can flexibly handle variable view counts, unknown camera poses, and uncalibrated intrinsics.

YoNoSplat is proposed to address these constraints by predicting local Gaussians and camera poses per view and aggregating them using either predicted or ground-truth poses, alongside a strategy for scale handling via intrinsic prediction and conditioning. The statement highlights the broader challenge of achieving comprehensive generalization across all such conditions within a single model.

References

Designing a single model that generalizes across these diverse settings -- varying number of views, posed or unposed, calibrated or uncalibrated -- remains an open challenge.

— YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting (2511.07321 - Ye et al., 10 Nov 2025) in Section 1, Introduction

Single-model generalization across view counts, camera pose availability, and calibration

Background

References

Related Problems