Scalable, calibration-free monocular 3D reconstruction

Develop scalable and calibration-free methods for 3D reconstruction from monocular RGB image sequences that recover accurate scene geometry and camera trajectories directly from uncalibrated inputs without auxiliary sensors, enabling robust operation in diverse real-world environments.

Background

The paper highlights a recent shift toward feed-forward foundation models for 3D perception from uncalibrated images, yet notes that extending these models to large-scale monocular RGB streams is hampered by computational and memory constraints. Traditional approaches often rely on calibrated cameras or additional sensing modalities, which sidestep the core monocular problem.

Within this context, the authors explicitly identify the broader field-level challenge of achieving scalable and calibration-free 3D reconstruction from monocular RGB alone. Their proposed S-MUSt3R pipeline addresses scalability via sliding-window segmentation and lightweight stitching and loop closure without retraining, but the statement in the introduction frames the underlying problem as still open, motivating continued research on robust monocular reconstruction at scale.

References

However, scalable and calibration-free 3D reconstruction from monocular RGB alone remains an open challenge, and addressing it is critical for a broad range of robotic and embodied agent navigation tasks in diverse real-world environments.

S-MUSt3R: Sliding Multi-view 3D Reconstruction  (2602.04517 - Antsfeld et al., 4 Feb 2026) in Section 1 (Introduction)