ZeroReg: Zero-Shot Point Cloud Registration with Foundation Models (2312.03032v3)
Abstract: State-of-the-art 3D point cloud registration methods rely on labeled 3D datasets for training, which limits their practical applications in real-world scenarios and often hinders generalization to unseen scenes. Leveraging the zero-shot capabilities of foundation models offers a promising solution to these challenges. In this paper, we introduce ZeroReg, a zero-shot registration approach that utilizes 2D foundation models to predict 3D correspondences. Specifically, ZeroReg adopts an object-to-point matching strategy, starting with object localization and semantic feature extraction from multi-view images using foundation models. In the object matching stage, semantic features help identify correspondences between objects across views. However, relying solely on semantic features can lead to ambiguity, especially in scenes with multiple instances of the same category. To address this, we construct scene graphs to capture spatial relationships among objects and apply a graph matching algorithm to these graphs to accurately identify matched objects. Finally, computing fine-grained point-level correspondences within matched object regions using algorithms like SuperGlue and LoFTR achieves robust point cloud registration. Evaluations on benchmarks such as 3DMatch, 3DLoMatch, and ScanNet demonstrate ZeroReg's competitive performance, highlighting its potential to advance point-cloud registration by integrating semantic features from foundation models.