Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Geometry-aware Feature Matching for Large-Scale Structure from Motion (2409.02310v3)

Published 3 Sep 2024 in cs.CV

Abstract: Establishing consistent and dense correspondences across multiple images is crucial for Structure from Motion (SfM) systems. Significant view changes, such as air-to-ground with very sparse view overlap, pose an even greater challenge to the correspondence solvers. We present a novel optimization-based approach that significantly enhances existing feature matching methods by introducing geometry cues in addition to color cues. This helps fill gaps when there is less overlap in large-scale scenarios. Our method formulates geometric verification as an optimization problem, guiding feature matching within detector-free methods and using sparse correspondences from detector-based methods as anchor points. By enforcing geometric constraints via the Sampson Distance, our approach ensures that the denser correspondences from detector-free methods are geometrically consistent and more accurate. This hybrid strategy significantly improves correspondence density and accuracy, mitigates multi-view inconsistencies, and leads to notable advancements in camera pose accuracy and point cloud density. It outperforms state-of-the-art feature matching methods on benchmark datasets and enables feature matching in challenging extreme large-scale settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Building rome in a day. In 2009 IEEE 12th International Conference on Computer Vision, pages 72–79, 2009.
  2. Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors. In CVPR, 2017.
  3. Speeded-up robust features (surf). CVIU, 110(3):346–359, 2008.
  4. 3d model acquisition from extended image sequences. In Computer Vision — ECCV ’96, pages 683–695, Berlin, Heidelberg, 1996. Springer Berlin Heidelberg.
  5. Aspanformer: Detector-free image matching with adaptive span transformer. ECCV, 2022.
  6. Universal correspondence network. NeurIPS, 2016.
  7. Scannet: Richly-annotated 3d reconstructions of indoor scenes, 2017.
  8. Monoslam: Real-time single camera slam. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):1052–1067, 2007.
  9. Superpoint: Self-supervised interest point detection and description. CVPR Workshops, pages 224–236, 2018.
  10. DKM: Dense kernelized feature matching for geometry estimation. In IEEE Conference on Computer Vision and Pattern Recognition, 2023.
  11. RoMa: Robust Dense Feature Matching. IEEE Conference on Computer Vision and Pattern Recognition, 2024.
  12. Stereoscan: Dense 3d reconstruction in real-time. In 2011 IEEE Intelligent Vehicles Symposium (IV), pages 963–968, 2011.
  13. Multiple View Geometry in Computer Vision. Cambridge University Press, New York, NY, USA, 2 edition, 2003.
  14. Adaptive assignment for geometry aware local feature matching. CVPR, 2023.
  15. Image Matching across Wide Baselines: From Paper to Practice. International Journal of Computer Vision, 2020.
  16. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 2023a.
  17. 3d gaussian splatting for real-time radiance field rendering, 2023b.
  18. Densegap: Graph-structured dense correspondence learning with anchor points. In 2022 26th International Conference on Pattern Recognition (ICPR), 2022.
  19. Dual-resolution correspondence networks. NeurIPS, 2020.
  20. MegaDepth: Learning single-view depth prediction from internet photos. In CVPR, 2018.
  21. Pixel-Perfect Structure-from-Motion with Featuremetric Refinement. In ICCV, 2021.
  22. LightGlue: Local Feature Matching at Light Speed. In ICCV, 2023.
  23. SIFT Flow: Dense correspondence across scenes and its applications. T-PAMI, 2010.
  24. David G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91–110, 2004.
  25. Nerf: Representing scenes as neural radiance fields for view synthesis, 2020.
  26. Keypoint detection in rgbd images based on an anisotropic scale space. IEEE Transactions on Multimedia, 18(9):1762–1771, 2016.
  27. Relative 3d reconstruction using multiple uncalibrated images. The International Journal of Robotics Research, 14(6):619–632, 1995.
  28. Real time localization and 3d reconstruction. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), pages 363–370, 2006.
  29. Dtam: Dense tracking and mapping in real-time. In 2011 International Conference on Computer Vision, pages 2320–2327, 2011.
  30. Dinov2: Learning robust visual features without supervision, 2024.
  31. Diffposenet: Direct differentiable camera pose estimation, 2022.
  32. Fast and accurate camera covariance computation for large 3d reconstruction, 2018.
  33. Neighbourhood consensus networks. NeurIPS, 2018.
  34. Efficient neighbourhood consensus networks via submanifold sparse convolutions. ECCV, 2020.
  35. Orb: An efficient alternative to sift or surf. ICCV, 2011.
  36. Slam++: Simultaneous localisation and mapping at the level of objects. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 1352–1359, 2013.
  37. SuperGlue: Learning feature matching with graph neural networks. In CVPR, 2020.
  38. Self-supervised visual descriptor learning for dense correspondence. RAL, 2016.
  39. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  40. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), 2016.
  41. LoFTR: Detector-free local feature matching with transformers. CVPR, 2021.
  42. Sfm-net: Learning of structure and motion from video, 2017.
  43. Matchformer: Interleaving attention in transformers for feature matching. In Asian Conference on Computer Vision, 2022.
  44. Efficient LoFTR: Semi-dense local feature matching with sparse-like speed. In CVPR, 2024.
  45. Lift: Learned invariant feature transform. ECCV, 2016.
  46. Ds-slam: A semantic visual slam towards dynamic environments. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1168–1174, 2018.
  47. Relpose: Predicting probabilistic relative rotation for single objects in the wild, 2022.
  48. Aliked: A lighter keypoint and descriptor extraction network via deformable transformation, 2023.
  49. Patch2pix: Epipolar-guided pixel-level correspondences. In CVPR, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Gonglin Chen (3 papers)
  2. Jinsen Wu (2 papers)
  3. Haiwei Chen (8 papers)
  4. Wenbin Teng (5 papers)
  5. Zhiyuan Gao (5 papers)
  6. Andrew Feng (27 papers)
  7. Rongjun Qin (47 papers)
  8. Yajie Zhao (22 papers)

Summary

Geometry-aware Feature Matching for Large-Scale Structure from Motion

This paper, "Geometry-aware Feature Matching for Large-Scale Structure from Motion", presents an innovative approach to address the challenges in establishing consistent and dense correspondences in Structure from Motion (SfM) systems, particularly in scenarios involving significant viewpoint changes such as air-to-ground imagery with sparse view overlap.

Key Contributions

The authors introduce a novel optimization-based method that significantly enhances the performance of existing feature matching techniques by incorporating geometric cues alongside traditional color cues. The core of the method lies in a geometry-aware optimization module that leverages the Sampson Distance to enforce geometric consistency, refining dense correspondences iteratively. The method uses sparse correspondences derived from detector-based methods as anchor points, guiding the matching process within detector-free frameworks.

Methodology

The proposed method integrates both detector-based and detector-free feature matching approaches:

  1. Detector-based Methods: These methods, exemplified by SuperPoint and SuperGlue, identify keypoints and descriptors independently before matching them. They excel in scenarios with richly textured images and small viewpoint changes but falter under extreme conditions.
  2. Detector-free Methods: Methods like LoFTR, ASpanFormer, and MatchFormer perform dense matching in a single step using end-to-end training with neural networks. While they provide denser correspondences, they lack control over the consistency of keypoints across multiple views.

The method formulates geometric verification as an optimization problem. By integrating sparse correspondences as geometric priors and enforcing geometric constraints via the Sampson Distance, the method iteratively refines and reassigns correspondences to ensure geometric consistency. This hybrid strategy combines the strengths of both approaches, offering improved correspondence density and accuracy and mitigating multi-view inconsistencies.

Experimental Results

The method has been evaluated on publicly available datasets, including the Image Matching Competition Benchmark and MegaDepth, as well as two specially collected air-to-ground datasets. The results demonstrate that:

  • Pose Estimation: The method achieves superior accuracy in camera pose estimation compared to state-of-the-art methods. For instance, on the IMC Phototourism benchmark, the method achieves an AUC of 90.1@10°10\degree, outperforming SuperPoint + SuperGlue and ALIKED + LightGlue.
  • Air-to-Ground Reconstruction: The method successfully registers all images and aligns UAV images with ground images, producing superior 3D models even in challenging large-scale scenarios.

Implications and Future Work

The implications of this research are significant for the field of computer vision, particularly in applications involving large-scale and challenging SfM scenarios. The method's ability to bridge detector-based and detector-free approaches could inspire further research and development in hybrid feature matching techniques.

Practically, the method could enhance various applications such as aerial mapping, autonomous navigation, and augmented reality by providing more accurate and robust 3D reconstructions from diverse and challenging datasets.

Future developments could focus on improving the efficiency of the algorithm, potentially through the integration of more computationally efficient backbone models or the application of multi-view refinement techniques. Additionally, the approach could be extended to other domains requiring robust feature matching under varying conditions.

Conclusion

The "Geometry-aware Feature Matching for Large-Scale Structure from Motion" paper presents a robust method for enhancing feature matching through geometric consistency, demonstrating significant improvements in SfM reconstruction accuracy. This method's hybrid approach effectively addresses the challenges posed by extreme viewpoint changes and sparse view overlap, marking a notable advancement in the field of computer vision.

X Twitter Logo Streamline Icon: https://streamlinehq.com