SACReg: Scene-Agnostic Coordinate Regression for Visual Localization (2307.11702v3)
Abstract: Scene coordinates regression (SCR), i.e., predicting 3D coordinates for every pixel of a given image, has recently shown promising potential. However, existing methods remain limited to small scenes memorized during training, and thus hardly scale to realistic datasets and scenarios. In this paper, we propose a generalized SCR model trained once to be deployed in new test scenes, regardless of their scale, without any finetuning. Instead of encoding the scene coordinates into the network weights, our model takes as input a database image with some sparse 2D pixel to 3D coordinate annotations, extracted from e.g. off-the-shelf Structure-from-Motion or RGB-D data, and a query image for which are predicted a dense 3D coordinate map and its confidence, based on cross-attention. At test time, we rely on existing off-the-shelf image retrieval systems and fuse the predictions from a shortlist of relevant database images w.r.t. the query. Afterwards camera pose is obtained using standard Perspective-n-Point (PnP). Starting from selfsupervised CroCo pretrained weights, we train our model on diverse datasets to ensure generalizabilty across various scenarios, and significantly outperform other scene regression approaches, including scene-specific models, on multiple visual localization benchmarks. Finally, we show that the database representation of images and their 2D-3D annotations can be highly compressed with negligible loss of localization performance.
- Map-free Visual Relocalization: Metric Pose Relative to a Single Image. In ECCV, 2022.
- RelocNet: Continuous Metric Learning Relocalisation Using Neural Nets. In ECCV, 2018.
- ARKitscenes - a diverse real-world dataset for 3d indoor scene understanding using mobile RGB-d data. In NeurIPS, 2021.
- Extending absolute pose regression to multiple scenes. In CVPRW, 2020.
- Learning 6D Object Pose Estimation Using 3D Object Coordinates. In ECCV, 2014.
- DSAC — Differentiable RANSAC for Camera Localization. In CVPR, 2017.
- Learning Less is More - 6D Camera Localization via 3D Surface Regression. In CVPR, 2018.
- Expert Sample Consensus Applied to Camera Re-Localization. In ICCV, 2019.
- Visual camera re-localization from RGB and RGB-D images using DSAC. IEEE Trans. PAMI, 2021.
- Geometry-Aware Learning of Maps for Camera Localization. In CVPR, 2018.
- Hybrid scene compression for visual localization. In CVPR, 2019.
- Minimal scene descriptions from structure from motion models. In CVPR, 2014.
- Cascaded parallel filtering for memory-efficient image-based localization. In ICCV, 2019.
- A data-driven point cloud simplification framework for city-scale image-based localization. IEEE Trans. Image Processing, 2017.
- Cspn++: Learning context and resource aware convolutional spatial propagation networks for depth completion. In AAAI, 2020.
- Blender Online Community. Blender - a 3D modelling and rendering package. Blender Foundation, 2018.
- ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. In CVPR, 2017.
- Superpoint: Self-supervised interest point detection and description. In CVPRW, 2018.
- CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization. In ICCV, 2019.
- Visual localization via few-shot scene region classification. In 3DV, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- D2-net: A trainable cnn for joint detection and description of local features. In CVPR, 2019.
- Keep it brief: Scalable creation of compressed localization maps. In IROS, 2015.
- Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981.
- Quantization. IEEE Trans. Inf. Theory, 1998.
- Multi-output learning for camera relocalization. In CVPR, 2014.
- Reconstructing the World in Six Days as Captured by the Yahoo 100 Million Image Dataset. In CVPR, 2015.
- Penet: Towards precise and efficient image guided depth completion. In ICRA, 2021.
- VS-Net: Voting with segmentation for visual localization. In CVPR, 2021.
- Robust image retrieval-based visual localization using kapture. arXiv preprint arXiv:2007.13867, 2020.
- Product quantization for nearest neighbor search. IEEE Trans. PAMI, 2011.
- Geometric Loss Functions for Camera Pose Regression with Deep Learning. In CVPR, 2017.
- Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In CVPR, 2018.
- PoseNet: a Convolutional Network for Real-Time 6-DOF Camera Relocalization. In ICCV, 2015.
- Epnp: An accurate o(n) solution to the pnp problem. IJCV, 2009.
- Decoupling makes weakly supervised local feature better. In CVPR, 2022.
- Hierarchical scene coordinate classification and regression for visual localization. In CVPR, 2020.
- Location recognition using prioritized feature matching. In ECCV, 2010.
- MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In CVPR, 2018.
- A convnet for the 2020s. CVPR, 2022.
- Decoupled weight decay regularization. In ICLR, 2019.
- David G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004.
- Geodesc: Learning local descriptors by integrating geometry constraints. In ECCV, 2018.
- Aslfeat: Learning local features of accurate shape and localization. In CVPR, 2020.
- J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967.
- Efficient scene compression for visual-based localization. In 3DV, 2020.
- Semattnet: Toward attention-based semantic aware guided depth completion. IEEE Access, 2022.
- Lf-net: Learning local features from images. In NeurIPS, 2018.
- 3d point cloud reduction using mixed-integer quadratic programming. In CVPR Workshop, 2013.
- Non-local spatial propagation network for depth completion. In ECCV, 2020.
- Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI. In NeurIPS datasets and benchmarks, 2021.
- Learning with average precision: Training image retrieval with a listwise loss. In ICCV, 2019.
- R2D2: Reliable and repeatable detector and descriptor. In NeurIPS, 2019.
- Improving zernike moments comparison for optimal similarity and rotation angle retrieval. IEEE trans. PAMI, 2008.
- Orb: An efficient alternative to sift or surf. In ICCV, 2011.
- From coarse to fine: Robust hierarchical localization at large scale. In CVPR, 2019.
- Hyperpoints and fine vocabularies for large-scale location recognition. In ICCV, 2015.
- Efficient & effective prioritized matching for large-scale image-based localization. IEEE trans. PAMI, 2017.
- Benchmarking 6dof outdoor visual localization in changing conditions. In CVPR, 2018.
- Habitat: A Platform for Embodied AI Research. In ICCV, 2019.
- Johannes L. Schönberger and Jan-Michael Frahm. Structure-from-motion Revisited. In CVPR, 2016.
- Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, 2016.
- Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images. In CVPR, 2013.
- Modeling the World from Internet Photo Collections. IJCV, 2008.
- The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019.
- Roformer: Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864, 2021.
- Habitat 2.0: Training Home Assistants to Rearrange their Habitat. In NeurIPS, 2021.
- Learning Camera Localization via Dense Scene Matching. In CVPR, 2021.
- Neumap: Neural coordinate mapping by auto-transdecoder for camera localization. In CVPR, 2023.
- A consistently fast and globally optimal solution to the perspective-n-point problem. In ECCV, 2020.
- Sosnet: Second order similarity regularization for local descriptor learning. In CVPR, 2019.
- Learning and aggregating deep local descriptors for instance-level recognition. In ECCV, 2020.
- Disk: Learning local features with policy gradient. In NeurIPS, 2020.
- Exploiting uncertainty in regression forests for accurate camera relocalization. In CVPR, 2015.
- Sparse and noisy lidar completion with RGB guidance and uncertainty. In ICMVA, 2019.
- Image-based localization with spatial lstms. In ICCV, 2017.
- AtLoc: Attention Guided Camera Localization. In AAAI, 2020.
- Learning feature descriptors using camera pose supervision. In ECCV, 2020.
- CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion. In NeurIPS, 2022.
- Learning super-features for image retrieval. In ICLR, 2022.
- Croco v2: Improved cross-view completion pre-training for stereo matching and optical flow. In ICCV, 2023.
- Sc-wls: Towards interpretable feed-forward camera re-localization. In ECCV, 2022.
- Deepmatcher: A deep transformer-based network for robust and accurate local feature matching. arXiv preprint arXiv:2301.02993, 2023.
- SANet: Scene Agnostic Network for Camera Localization. In ICCV, 2019.
- Scenesqueezer: Learning to compress scene for camera relocalization. In CVPR, 2022.
- Lift: Learned invariant feature transform. In ECCV, 2016.
- DPOD: 6D Pose Object Detector and Refiner. In ICCV, 2019.
- Reference Pose Generation for Long-term Visual Localization via Learned Features and View Synthesis. IJCV, 2020.
- Kfnet: Learning temporal camera relocalization using kalman filtering. In CVPR, 2020.
- To Learn or Not to Learn: Visual Localization from Essential Matrices. In ICRA, 2020.