Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

D2S: Representing sparse descriptors and 3D coordinates for camera relocalization (2307.15250v4)

Published 28 Jul 2023 in cs.CV and cs.RO

Abstract: State-of-the-art visual localization methods mostly rely on complex procedures to match local descriptors and 3D point clouds. However, these procedures can incur significant costs in terms of inference, storage, and updates over time. In this study, we propose a direct learning-based approach that utilizes a simple network named D2S to represent complex local descriptors and their scene coordinates. Our method is characterized by its simplicity and cost-effectiveness. It solely leverages a single RGB image for localization during the testing phase and only requires a lightweight model to encode a complex sparse scene. The proposed D2S employs a combination of a simple loss function and graph attention to selectively focus on robust descriptors while disregarding areas such as clouds, trees, and several dynamic objects. This selective attention enables D2S to effectively perform a binary-semantic classification for sparse descriptors. Additionally, we propose a simple outdoor dataset to evaluate the capabilities of visual localization methods in scene-specific generalization and self-updating from unlabeled observations. Our approach outperforms the previous regression-based methods in both indoor and outdoor environments. It demonstrates the ability to generalize beyond training data, including scenarios involving transitions from day to night and adapting to domain shifts. The source code, trained models, dataset, and demo videos are available at the following link: https://thpjp.github.io/d2s.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Surf: Speeded up robust features. Lecture notes in computer science, 3951:404–417, 2006.
  2. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 224–236, 2018.
  3. D2-net: A trainable cnn for joint detection and description of local features. arXiv preprint arXiv:1905.03561, 2019.
  4. R2d2: repeatable and reliable detector and descriptor. arXiv preprint arXiv:1906.06195, 2019.
  5. Multi-view optimization of local feature geometry. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 670–686. Springer, 2020.
  6. From coarse to fine: Robust hierarchical localization at large scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12716–12725, 2019.
  7. Pixel-perfect structure-from-motion with featuremetric refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5987–5997, 2021.
  8. David G Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60:91–110, 2004.
  9. Netvlad: Cnn architecture for weakly supervised place recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5297–5307, 2016.
  10. From structure-from-motion point clouds to fast location recognition. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 2599–2606. IEEE, 2009.
  11. Large-scale, real-time visual–inertial localization revisited. The International Journal of Robotics Research, 39(9):1061–1084, 2020.
  12. Dsac-differentiable ransac for camera localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6684–6692, 2017.
  13. Kfnet: Learning temporal camera relocalization using kalman filtering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4919–4928, 2020.
  14. Hierarchical scene coordinate classification and regression for visual localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11983–11992, 2020.
  15. Visual camera re-localization from rgb and rgb-d images using dsac. IEEE transactions on pattern analysis and machine intelligence, 44(9):5847–5865, 2021.
  16. Full-frame scene coordinate regression for image-based localization. arXiv preprint arXiv:1802.03237, 2018.
  17. Learning less is more-6d camera localization via 3d surface regression. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4654–4662, 2018.
  18. Long-term visual localization in deep-sea underwater environment. In ORASIS, 2023.
  19. Learning to detect scene landmarks for camera localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11132–11142, 2022.
  20. City-scale location recognition. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–7. IEEE, 2007.
  21. 24/7 place recognition by view synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1808–1817, 2015.
  22. Planet-photo geolocation with convolutional neural networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pages 37–55. Springer, 2016.
  23. Are large-scale 3d models really necessary for accurate visual localization? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1637–1646, 2017.
  24. Inloc: Indoor visual localization with dense matching and view synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7199–7209, 2018.
  25. Relocnet: Continuous metric learning relocalisation using neural nets. In Proceedings of the European Conference on Computer Vision (ECCV), pages 751–767, 2018.
  26. Improved visual relocalization by discovering anchor points. arXiv preprint arXiv:1811.04370, 2018.
  27. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision, pages 2938–2946, 2015.
  28. Geometric loss functions for camera pose regression with deep learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5974–5983, 2017.
  29. Spp-net: Deep absolute pose regression with synthetic views. arXiv preprint arXiv:1712.03452, 2017.
  30. Atloc: Attention guided camera localization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 10393–10401, 2020.
  31. Direct-posenet: absolute pose regression with photometric consistency. In 2021 International Conference on 3D Vision (3DV), pages 1175–1185. IEEE, 2021.
  32. Featloc: Absolute pose regressor for indoor 2d sparse features with simplistic view synthesizing. ISPRS Journal of Photogrammetry and Remote Sensing, 189:50–62, 2022.
  33. Dfnet: Enhance absolute pose regression with direct feature matching. In European Conference on Computer Vision, pages 1–17. Springer, 2022.
  34. Understanding the limitations of cnn-based absolute camera pose regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3302–3312, 2019.
  35. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4104–4113, 2016.
  36. City-scale localization for cameras with known vertical direction. IEEE transactions on pattern analysis and machine intelligence, 39(7):1455–1461, 2016.
  37. Semantic match consistency for long-term visual localization. In Proceedings of the European Conference on Computer Vision (ECCV), pages 383–399, 2018.
  38. Efficient & effective prioritized matching for large-scale image-based localization. IEEE transactions on pattern analysis and machine intelligence, 39(9):1744–1756, 2016.
  39. Efficient global 2d-3d matching for camera localization in a large-scale 3d map. In Proceedings of the IEEE International Conference on Computer Vision, pages 2372–2381, 2017.
  40. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020.
  41. Back to the feature: Learning robust camera localization from pixels to pose. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3247–3257, 2021.
  42. Segloc: Learning segmentation-based representations for privacy-preserving visual localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15380–15391, 2023.
  43. Fast image-based localization using direct 2d-to-3d matching. In 2011 International Conference on Computer Vision, pages 667–674. IEEE, 2011.
  44. Location recognition using prioritized feature matching. In Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part II 11, pages 791–804. Springer, 2010.
  45. Hybrid scene compression for visual localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7653–7662, 2019.
  46. Efficient scene compression for visual-based localization. In 2020 International Conference on 3D Vision (3DV), pages 1–10. IEEE, 2020.
  47. Scenesqueezer: Learning to compress scene for camera relocalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8259–8268, 2022.
  48. Scene coordinate regression forests for camera relocalization in rgb-d images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2930–2937, 2013.
  49. Exploiting uncertainty in regression forests for accurate camera relocalization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4400–4408, 2015.
  50. Multi-output learning for camera relocalization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1114–1121, 2014.
  51. Expert sample consensus applied to camera re-localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7525–7534, 2019.
  52. Visual localization via few-shot scene region classification. arXiv preprint arXiv:2208.06933, 2022.
  53. Accelerated coordinate encoding: Learning to relocalize in minutes using rgb and poses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5044–5053, 2023.
  54. Improving the agility of keyframe-based slam. In Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part II 10, pages 802–815. Springer, 2008.
  55. Lsd-slam: Large-scale direct monocular slam. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part II 13, pages 834–849. Springer, 2014.
  56. Orb-slam: a versatile and accurate monocular slam system. IEEE transactions on robotics, 31(5):1147–1163, 2015.
  57. Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 4471–4478. IEEE, 2017.
  58. Openvslam: A versatile visual slam framework. In Proceedings of the 27th ACM International Conference on Multimedia, pages 2292–2295, 2019.
  59. Geometry-aware learning of maps for camera localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2616–2625, 2018.
  60. Complete solution classification for the perspective-three-point problem. IEEE transactions on pattern analysis and machine intelligence, 25(8):930–943, 2003.
  61. An efficient algebraic solution to the perspective-three-point problem. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7225–7233, 2017.
  62. Real-time solution to the absolute pose problem with unknown radial distortion and focal length. In Proceedings of the IEEE International Conference on Computer Vision, pages 2816–2823, 2013.
  63. Locally optimized ransac. In Pattern Recognition: 25th DAGM Symposium, Magdeburg, Germany, September 10-12, 2003. Proceedings 25, pages 236–243. Springer, 2003.
  64. Loftr: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8922–8931, 2021.
  65. Cotr: Correspondence transformer for matching across images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6207–6217, 2021.
  66. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
  67. Fast and lightweight scene regressor for camera relocalization. arXiv preprint arXiv:2212.01830, 2022.
  68. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  69. Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
  70. Mohammad Altillawi. Pixselect: Less but reliable pixels for accurate and efficient localization. In 2022 International Conference on Robotics and Automation (ICRA), pages 4156–4162. IEEE, 2022.
  71. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  72. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  73. Python bindings for colmap. In https://github.com/colmap/pycolmap.
  74. Kinectfusion: Real-time dense surface mapping and tracking. In 2011 10th IEEE international symposium on mixed and augmented reality, pages 127–136. Ieee, 2011.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com