End-to-end differentiable SfM-to-3DGS pipeline

Develop a fully end-to-end differentiable 3D reconstruction pipeline for joint pose–appearance optimization with 3D Gaussian Splatting in which gradients from the rendering loss are propagated through the Structure-from-Motion stages and into the feature extractor, thereby enabling the feature network to learn representations optimized for downstream reconstruction quality.

Background

GloSplat performs joint pose–appearance optimization during 3D Gaussian Splatting but keeps feature extraction, matching, and pair selection as frozen preprocessing components. As a result, gradients do not flow back into these upstream modules, and only camera poses and Gaussian primitives are refined during training.

The authors explicitly note that making the entire pipeline differentiable so that rendering losses can influence both SfM and the feature extractor is still an unresolved challenge. Achieving this would allow the feature network to learn features tailored to reconstruction quality rather than generic matching performance, but it may introduce significant engineering and stability challenges.

References

Second, a fully end-to-end differentiable approach, where gradients from rendering losses flow back through SfM and into the feature extractor itself, remains an open challenge. Such a unified architecture would enable the feature network to learn representations optimized for downstream reconstruction quality rather than generic matching performance, though this requires significant engineering effort and may introduce stability challenges during training.

GloSplat: Joint Pose-Appearance Optimization for Faster and More Accurate 3D Reconstruction  (2603.04847 - Xiong et al., 5 Mar 2026) in Limitations and Future Work, Section 5 (Conclusion)