Optimal Vision Foundation Model backbone for correspondence estimation
Determine the optimal Vision Foundation Model—considering both 2D-pretrained and 3D-pretrained architectures—to serve as the feature extractor backbone for dense correspondence estimation between image pairs, evaluating accuracy and robustness across domains and viewpoints.
Sponsor
References
While previous methods have shown that both 2D and 3D VFMs are helpful as the feature extractor backbone for correspondence estimation, the optimal VFM choice remains an open question.
— SPIDER: Spatial Image CorresponDence Estimator for Robust Calibration
(2511.17750 - Shao et al., 21 Nov 2025) in Section 3.1 (Multi-Scale 3D Vision Foundation Models)