Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Two-View Correspondences and Geometry Using Order-Aware Network (1908.04964v1)

Published 14 Aug 2019 in cs.CV, cs.CG, and cs.LG

Abstract: Establishing correspondences between two images requires both local and global spatial context. Given putative correspondences of feature points in two views, in this paper, we propose Order-Aware Network, which infers the probabilities of correspondences being inliers and regresses the relative pose encoded by the essential matrix. Specifically, this proposed network is built hierarchically and comprises three novel operations. First, to capture the local context of sparse correspondences, the network clusters unordered input correspondences by learning a soft assignment matrix. These clusters are in a canonical order and invariant to input permutations. Next, the clusters are spatially correlated to form the global context of correspondences. After that, the context-encoded clusters are recovered back to the original size through a proposed upsampling operator. We intensively experiment on both outdoor and indoor datasets. The accuracy of the two-view geometry and correspondences are significantly improved over the state-of-the-arts. Code will be available at https://github.com/zjhthu/OANet.git.

Citations (295)

Summary

  • The paper presents a novel neural architecture (OANet) that hierarchically processes local and global contexts to improve two-view correspondence matching.
  • The approach integrates differentiable pooling, order-aware unpooling, and filtering to boost the accuracy of relative pose estimation.
  • Empirical results on datasets like YFCC100M and SUN3D demonstrate significant gains over traditional geometric methods in visual computing.

Overview of Order-Aware Network for Two-View Correspondences

This paper presents an innovative approach in computer vision with the introduction of the Order-Aware Network (OANet) designed to enhance the estimation of two-view geometry by effectively learning and leveraging correspondences between two images. The core objectives include refining the accuracy of identifying inliers among putative correspondences and improving the relative pose estimation encoded by the essential matrix.

Key Contributions

The primary contribution of this research is the development of a novel neural network architecture that hierarchically processes local and global spatial contexts of correspondences. The authors propose three unique operations integrated within the OANet architecture:

  1. Differentiable Pooling (DiffPool): This layer is utilized to cluster unordered input correspondences, facilitating the capture of local context. It is characterized by a soft assignment matrix that places the features in a canonical order, ensuring permutation invariance.
  2. Order-Aware Differentiable Unpooling (DiffUnpool): Designed to address the alignment of features, this layer effectively recovers the spatial order within the network's hierarchical structure, allowing the restoration of clusters to their original size while maintaining alignment with the input correspondences.
  3. Order-Aware Filtering: By applying spatially-correlated operations, this filtering block captures intricate global contexts by modeling relations between clustered correspondences in the canonical order established by the DiffPool layer.

Experimental Results

Extensive experimentation on large-scale outdoor datasets, such as YFCC100M, and indoor datasets, such as SUN3D, indicates significant improvements over existing state-of-the-art methods. Notably, OANet achieved higher accuracy in relative pose estimation under diverse conditions, outperforming traditional methods that rely heavily on geometric heuristics.

The inclusion of iterative neural network processes and the application of sophisticated loss functions, such as the Gold Standard geometry loss, empirically enriched the model's performance. Adjustment of training parameters, such as the introduction of multi-level architecture designs, further optimized the capture of spatial context.

Implications and Future Directions

The application of the DiffPool and DiffUnpool mechanisms in sparse matching contexts reveals pathways to substantially enhance the modeling of spatial patterns in point correspondences. The hierarchical coalescing of local and global contexts seen in OANet offers impactful prospects for future research on visual computed geometry systems, particularly within Structure from Motion and SLAM applications.

The potential to integrate OANet within broader deep-learning frameworks highlights an opportunity to extend its utility to various alignment and pose estimation tasks, considering a range of geometric and visual data types. Future explorations could focus on scaling these principles effectively for real-time applications and integrating with alternative sensor modalities for enhanced robustness and versatility.

OANet, as introduced in this paper, marks an insightful advancement in harnessing machine learning for geometric correspondence tasks, setting the stage for quantitatively-driven enhancements in visual computing methodologies.