IMP: Iterative Matching and Pose Estimation with Adaptive Pooling (2304.14837v2)
Abstract: Previous methods solve feature matching and pose estimation using a two-stage process by first finding matches and then estimating the pose. As they ignore the geometric relationships between the two tasks, they focus on either improving the quality of matches or filtering potential outliers, leading to limited efficiency or accuracy. In contrast, we propose an iterative matching and pose estimation framework (IMP) leveraging the geometric connections between the two tasks: a few good matches are enough for a roughly accurate pose estimation; a roughly accurate pose can be used to guide the matching by providing geometric constraints. To this end, we implement a geometry-aware recurrent attention-based module which jointly outputs sparse matches and camera poses. Specifically, for each iteration, we first implicitly embed geometric information into the module via a pose-consistency loss, allowing it to predict geometry-aware matches progressively. Second, we introduce an \textbf{e}fficient IMP, called EIMP, to dynamically discard keypoints without potential matches, avoiding redundant updating and significantly reducing the quadratic time complexity of attention computation in transformers. Experiments on YFCC100m, Scannet, and Aachen Day-Night datasets demonstrate that the proposed method outperforms previous approaches in terms of accuracy and efficiency.
- Deep vit features as dense visual descriptors. In ECCV, 2022.
- NetVLAD: CNN architecture for weakly supervised place recognition. In CVPR, 2016.
- Three things everyone should know to improve object retrieval. In CVPR, 2012.
- Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry. In CVPR, 2022.
- Learning To Find Good Models in RANSAC. In CVPR, 2022.
- Relative Pose from SIFT Features. In ECCV, 2022.
- MAGSAC: marginalizing sample consensus. In CVPR, 2019.
- MAGSAC++, a fast, reliable and accurate robust estimator. In CVPR, 2020.
- Neural-Guided RANSAC: Learning where to sample model hypotheses. In ICCV, 2019.
- Emerging properties in self-supervised vision transformers. In ICCV, 2021.
- Handcrafted outlier detection revisited. In ECCV, 2020.
- PSViT: Better vision transformer via token pooling and attention sharing. arXiv preprint arXiv:2108.03428, 2021.
- Learning to match features with seeded graph matching network. In CVPR, 2021.
- Two-view geometry estimation unaffected by a dominant plane. In CVPR, 2005.
- Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In NIPS, 2013.
- Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration. ACM ToG, 2017.
- MS2DG-Net: Progressive Correspondence Learning via Multiple Sparse Semantics Dynamic Graph. In CVPR, 2022.
- Superpoint: Self-supervised interest point detection and description. In CVPRW, 2018.
- On the Instability of Relative Pose Estimation and RANSAC’s Role. In CVPR, 2022.
- Adaptive Token Sampling For Efficient Vision Transformers. In ECCV, 2022.
- Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981.
- Multiple view geometry in computer vision. Cambridge university press, 2003.
- Transformers are rnns: Fast autoregressive transformers with linear attention. In ICML, 2020.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- Set transformer: A framework for attention-based permutation-invariant neural networks. In ICML, 2019.
- EPnP: An Accurate O(n) Solution to the PnP Problem. IJCV, 81:155–166, 2009.
- Megadepth: Learning single-view depth prediction from internet photos. In CVPR, 2018.
- Pixel-Perfect Structure-from-Motion with Featuremetric Refinement. In ICCV, 2021.
- Learnable motion coherence for correspondence pruning. In CVPR, 2021.
- David G Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004.
- Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
- Computational optimal transport. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
- Revisiting oxford and paris: Large-scale image retrieval benchmarking. In CVPR, 2018.
- USAC: A universal framework for random sample consensus. TPAMI, 2012.
- Dynamicvit: Efficient vision transformers with dynamic token sparsification. In NeurIPS, 2021.
- ORB: An efficient alternative to SIFT or SURF. In ICCV, 2011.
- From Coarse to Fine: Robust Hierarchical Localization at Large Scale. In CVPR, 2019.
- Superglue: Learning feature matching with graph neural networks. In CVPR, 2020.
- Benchmarking 6dof outdoor visual localization in changing conditions. In CVPR, 2018.
- ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching. In CVPR, 2022.
- Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21(2):343–348, 1967.
- Acne: Attentive context normalization for robust permutation-equivariant learning. In CVPR, 2020.
- Efficient linear attention for fast and accurate keypoint matching. In ICMR, 2022.
- Efficient transformers: A survey. ACM Computing Surveys, 2020.
- YFCC100M: The new data in multimedia research. Communications of the ACM, 2016.
- Training data-efficient image transformers & distillation through attention. In ICML, 2021.
- Attention is all you need. In NeurIPS, 2017.
- Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768, 2020.
- Sun3d: A database of big spaces reconstructed using sfm and object labels. In ICCV, 2013.
- SFD2: Semantic-guided Feature Detection and Description. In CVPR, 2023.
- Local supports global: Deep camera relocalization with sequence enhancement. In ICCV, 2019.
- Learning multi-view camera relocalization with graph neural networks. In CVPR, 2020.
- Learning to find good correspondences. In CVPR, 2018.
- Learning two-view correspondences and geometry using order-aware network. In ICCV, 2019.
- Reference pose generation for long-term visual localization via learned features and view synthesis. IJCV, 2021.
- Progressive correspondence pruning by consensus learning. In ICCV, 2021.
- T-Net: Effective Permutation-Equivariant Network for Two-View Correspondence Learning. In ICCV, 2021.