Learning Second-Order Attentive Context for Efficient Correspondence Pruning (2303.15761v1)
Abstract: Correspondence pruning aims to search consistent correspondences (inliers) from a set of putative correspondences. It is challenging because of the disorganized spatial distribution of numerous outliers, especially when putative correspondences are largely dominated by outliers. It's more challenging to ensure effectiveness while maintaining efficiency. In this paper, we propose an effective and efficient method for correspondence pruning. Inspired by the success of attentive context in correspondence problems, we first extend the attentive context to the first-order attentive context and then introduce the idea of attention in attention (ANA) to model second-order attentive context for correspondence pruning. Compared with first-order attention that focuses on feature-consistent context, second-order attention dedicates to attention weights itself and provides an additional source to encode consistent context from the attention map. For efficiency, we derive two approximate formulations for the naive implementation of second-order attention to optimize the cubic complexity to linear complexity, such that second-order attention can be used with negligible computational overheads. We further implement our formulations in a second-order context layer and then incorporate the layer in an ANA block. Extensive experiments demonstrate that our method is effective and efficient in pruning outliers, especially in high-outlier-ratio cases. Compared with the state-of-the-art correspondence pruning approach LMCNet, our method runs 14 times faster while maintaining a competitive accuracy.
- Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- Graph-cut RANSAC. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., 6733–6741.
- MAGSAC++, a fast, reliable and accurate robust estimator. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., 1304–1312.
- Barath, D.; et al. 2019. MAGSAC: marginalizing sample consensus. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., 10197–10205.
- Gms: Grid-based motion statistics for fast, ultra-robust feature correspondence. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., 4181–4190.
- Dsac-differentiable ransac for camera localization. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., 6684–6692.
- End-to-end object detection with transformers. In Proc. Eur. Conf. Comput. Vis., 213–229. Springer.
- Adalam: Revisiting handcrafted outlier detection. arXiv preprint arXiv:2006.04250.
- Co-segmentation guided hough transform for robust feature matching. IEEE Trans. Pattern Anal. Mach. Intell., 37(12): 2388–2401.
- Two-view geometry estimation unaffected by a dominant plane. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., volume 1, 772–779. IEEE.
- Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., 5828–5839.
- Superpoint: Self-supervised interest point detection and description. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn. Workshop, 224–236.
- Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6): 381–395.
- Structured attention networks. arXiv preprint arXiv:1702.00887.
- ImageNet classification with deep convolutional neural networks. Commun. ACM, 60: 84 – 90.
- CODE: Coherence based decision boundaries for feature correspondence. IEEE Trans. Pattern Anal. Mach. Intell., 40(1): 34–47.
- Learnable Motion Coherence for Correspondence Pruning. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., 3237–3246.
- Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis., 60(2): 91–110.
- Locality preserving matching. Int. J. Comput. Vis., 127(5): 512–531.
- Working hard to know your neighbor’s margins: Local descriptor learning loss. In Proc. Adv. Neural Inf. Process. Syst., 4829–4840.
- An End-to-End Transformer Model for 3D Object Detection. In Proc. IEEE Int. Conf. Comput. Vis., 2906–2917.
- ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot., 31: 1147–1163.
- LF-Net: Learning Local Features from Images. In Proc. Adv. Neural Inf. Process. Syst., 6237–6247.
- Transview: Inside, outside, and across the cropping view boundaries. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 4218–4227.
- PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proc. Adv. Neural Inf. Process. Syst., 5105–5114.
- Fast k-dimensional tree algorithms for nearest neighbor search with application to vector quantization encoding. IEEE Trans. Signal Process., 40(3): 518–531.
- ORB: An efficient alternative to SIFT or SURF. In Proc. IEEE Int. Conf. Comput. Vis., 2564–2571. Ieee. ISBN 1457711028.
- Superglue: Learning feature matching with graph neural networks. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., 4938–4947.
- The graph neural network model. IEEE Trans. Neural Netw., 20(1): 61–80.
- Structure-from-Motion Revisited. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., 4104–4113.
- ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., 11283–11292.
- YFCC100M: the new data in multimedia research. Commun. ACM, 59: 64–73.
- Attention is All you Need. In Proc. Adv. Neural Inf. Process. Syst., 6000–6010.
- Interior Attention-Aware Network for Infrared Small Target Detection. IEEE Transactions on Geoscience and Remote Sensing, 60: 1–13.
- Non-local neural networks. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., 7794–7803.
- SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels. Proc. IEEE Int. Conf. Comput. Vis., 1625–1632.
- Learning to Find Good Correspondences. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., 2666–2674.
- Hierarchical graph representation learning with differentiable pooling. arXiv preprint arXiv:1806.08804.
- Learning Two-View Correspondences and Geometry Using Order-Aware Network. In Proc. IEEE Int. Conf. Comput. Vis., 5844–5853.
- NM-Net: Mining reliable neighbors for robust feature correspondences. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., 215–224.
- Progressive Correspondence Pruning by Consensus Learning. In Proc. IEEE Int. Conf. Comput. Vis., 6464–6473.
- Scalable Multi-Consistency Feature Matching with Non-Cooperative Games. Proc. IEEE Int. Conf. Image Process., 1258–1262.
- TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., 2266–2276.