Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MGNet: Learning Correspondences via Multiple Graphs (2401.04984v1)

Published 10 Jan 2024 in cs.CV

Abstract: Learning correspondences aims to find correct correspondences (inliers) from the initial correspondence set with an uneven correspondence distribution and a low inlier rate, which can be regarded as graph data. Recent advances usually use graph neural networks (GNNs) to build a single type of graph or simply stack local graphs into the global one to complete the task. But they ignore the complementary relationship between different types of graphs, which can effectively capture potential relationships among sparse correspondences. To address this problem, we propose MGNet to effectively combine multiple complementary graphs. To obtain information integrating implicit and explicit local graphs, we construct local graphs from implicit and explicit aspects and combine them effectively, which is used to build a global graph. Moreover, we propose Graph~Soft~Degree~Attention (GSDA) to make full use of all sparse correspondence information at once in the global graph, which can capture and amplify discriminative features. Extensive experiments demonstrate that MGNet outperforms state-of-the-art methods in different visual tasks. The code is provided in https://github.com/DAILUANYUAN/MGNet-2024AAAI.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Pointdsc: Robust point cloud registration using deep spatial consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15859–15869.
  2. HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. In Proceedings of the IEEE conference on computer vision and pattern recognition, 5173–5182.
  3. Graph-cut RANSAC. In Proceedings of the IEEE conference on computer vision and pattern recognition, 6733–6741.
  4. MAGSAC: marginalizing sample consensus. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10197–10205.
  5. MAGSAC++, a fast, reliable and accurate robust estimator. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1304–1312.
  6. Two-view geometry estimation unaffected by a dominant plane. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, 772–779. IEEE.
  7. Enhancing Two-View Correspondence Learning By Local-Global Self-Atention. Neurocomputing.
  8. MS2DG-Net: Progressive Correspondence Learning via Multiple Sparse Semantics Dynamic Graph. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8973–8982.
  9. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 224–236.
  10. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6): 381–395.
  11. Image matching across wide baselines: From paper to practice. International Journal of Computer Vision, 129(2): 517–547.
  12. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
  13. U-Match: Two-view Correspondence Learning with Hierarchy-aware Local Context Aggregation. In International Joint Conference on Artificial Intelligence (IJCAI).
  14. Motion Consistency-Based Correspondence Growing for Remote Sensing Image Matching. IEEE Geoscience and Remote Sensing Letters, 19: 1–5.
  15. Learnable Motion Coherence for Correspondence Pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3237–3246.
  16. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2): 91–110.
  17. Infrared and visible image fusion methods and applications: A survey. Information Fusion, 45: 153–178.
  18. Robust feature matching for remote sensing image registration via locally linear transforming. IEEE Transactions on Geoscience and Remote Sensing, 53(12): 6469–6481.
  19. Learning to find good correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2666–2674.
  20. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE transactions on robotics, 31(5): 1147–1163.
  21. Automatic differentiation in pytorch.
  22. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30: 5099–5108.
  23. Geometric transformer for fast and robust point cloud registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11143–11152.
  24. Deep fundamental matrix estimation. In Proceedings of the European Conference on Computer Vision (ECCV), 284–299.
  25. ORB: An efficient alternative to SIFT or SURF. In 2011 International conference on computer vision, 2564–2571. Ieee.
  26. From coarse to fine: Robust hierarchical localization at large scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12716–12725.
  27. SuperGlue: Learning Feature Matching With Graph Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  28. Benchmarking 6dof outdoor visual localization in changing conditions. In Proceedings of the IEEE conference on computer vision and pattern recognition, 8601–8610.
  29. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4104–4113.
  30. ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11286–11295.
  31. YFCC100M: The new data in multimedia research. Communications of the ACM, 59(2): 64–73.
  32. Robust computation and parametrization of multiple view relations. In Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), 727–732. IEEE.
  33. Attention is all you need. In Advances in neural information processing systems, 5998–6008.
  34. Learning to find reliable correspondences with local neighborhood consensus. Neurocomputing, 406: 150–158.
  35. Sun3d: A database of big spaces reconstructed using sfm and object labels. In Proceedings of the IEEE international conference on computer vision, 1625–1632.
  36. Learning Second-Order Attentive Context for Efficient Correspondence Pruning. arXiv preprint arXiv:2303.15761.
  37. Hierarchical graph representation learning with differentiable pooling. Advances in neural information processing systems, 31.
  38. Learning two-view correspondences and geometry using order-aware network. In Proceedings of the IEEE International Conference on Computer Vision, 5845–5854.
  39. Nm-net: Mining reliable neighbors for robust feature correspondences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 215–224.
  40. Progressive Correspondence Pruning by Consensus Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6464–6473.
Citations (3)

Summary

  • The paper introduces a multi-graph network that uses the Graph Soft Degree Attention mechanism to effectively distinguish inliers from outliers.
  • The method integrates local and global graph representations, significantly improving camera pose estimation, homography, and visual localization tasks.
  • Empirical results demonstrate that MGNet achieves superior performance with fewer parameters, underlining its efficiency and practical impact.

Introduction to MGNet

Understanding pixel-wise correspondences between different images is a cornerstone in the field of computer vision. These correspondences are essential for a variety of tasks such as image stitching, SLAM, and 3D reconstruction. The challenge, however, lies in identifying correct correspondences, often called inliers, amidst a plethora of false ones or outliers, particularly when the inlier rate is low and distribution of potential matches is uneven - a common occurrence in real-world scenarios.

Understanding Multiple Graphs

Traditional approaches of graph-based neural networks (GNNs) often represent correspondences as a graph and apply a singular approach to cull inliers from outliers. Recent advancements have integrated local graph interpretations into a larger global framework. Yet, this has typically overlooked the complementary benefits of constructing multiple graph types simultaneously. Such a multifaceted approach can capture the nuanced relationships between sparse correspondences more effectively.

MGNet leverages this insight and presents a novel network architecture that harnesses the power of multiple, complementary graphs. It introduces an innovative Graph Soft Degree Attention (GSDA) mechanism to utilize sparse correspondence information in a global graph setting. This enables the network to spotlight and intensify discriminative features, instrumental in distinguishing inliers.

Breaking Down the Approach

To capture both implicit and explicit information, MGNet constructs local graphs through these dual perspectives and subsequently explores the interrelations between them. Through GSDA, MGNet excels at synthesizing global information to spotlight and amplify more discriminative cues. As depicted within the research, the method distinctly outperforms its predecessors across multiple visual tasks. With an emphasis on adaptability and finesse, MGNet realizes a greater utility of GNNs in pinning down sparse correspondences.

Empirical Validation

Extensive experimentation underlines MGNet's superiority over state-of-the-art methods in camera pose estimation, homography estimation, and visual localization. Strikingly, its performance peaks even when utilizing fewer parameters, an indication of its efficiency and effectiveness. The provided open-source codebase encourages further exploration and development within the community.

Conclusion

The development of MGNet marks a significant stride in correspondence learning. By embracing a multi-graph perspective and introducing the Graph Soft Degree Attention mechanism, it sets a new benchmark in the field. Its exceptional results across various tests bear testament to its robust performance, holding promise for a plethora of applications within both academia and industry.

Github Logo Streamline Icon: https://streamlinehq.com