Deep Homography Estimation for Visual Place Recognition (2402.16086v2)
Abstract: Visual place recognition (VPR) is a fundamental task for many applications such as robot localization and augmented reality. Recently, the hierarchical VPR methods have received considerable attention due to the trade-off between accuracy and efficiency. They usually first use global features to retrieve the candidate images, then verify the spatial consistency of matched local features for re-ranking. However, the latter typically relies on the RANSAC algorithm for fitting homography, which is time-consuming and non-differentiable. This makes existing methods compromise to train the network only in global feature extraction. Here, we propose a transformer-based deep homography estimation (DHE) network that takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification. Moreover, we design a re-projection error of inliers loss to train the DHE network without additional homography labels, which can also be jointly trained with the backbone network to help it extract the features that are more suitable for local matching. Extensive experiments on benchmark datasets show that our method can outperform several state-of-the-art methods. And it is more than one order of magnitude faster than the mainstream hierarchical VPR methods using RANSAC. The code is released at https://github.com/Lu-Feng/DHE-VPR.
- Ali-bey, A.; et al. 2022. Gsv-cities: Toward appropriate supervised visual place recognition. Neurocomputing, 513: 194–203.
- Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words. IEEE Transactions on Robotics, 24(5): 1027–1037.
- NetVLAD: CNN architecture for weakly supervised place recognition. In CVPR, 5297–5307.
- Speeded-up robust features (SURF). Computer vision and image understanding, 110(3): 346–359.
- Rethinking visual geo-localization for large-scale applications. In CVPR, 4878–4888.
- Viewpoint invariant dense matching for visual geolocalization. In ICCV, 12169–12178.
- Deep visual geo-localization benchmark. In CVPR, 5396–5407.
- Dsac-differentiable ransac for camera localization. In CVPR, 6684–6692.
- Brachmann, E.; et al. 2019. Neural-guided RANSAC: Learning where to sample model hypotheses. In ICCV, 4322–4331.
- Brachmann, E.; et al. 2021. Visual camera re-localization from RGB and RGB-D images using DSAC. IEEE transactions on pattern analysis and machine intelligence, 44(9): 5847–5865.
- Camara, L. G.; et al. 2020. Highly Robust Visual Place Recognition Through Spatial Matching of CNN Features. In ICRA, 3748–3755.
- Cao, B.; et al. 2020. Unifying deep local and global features for image search. In ECCV, 726–743. Springer.
- Deep learning features at scale for visual place recognition. In ICRA, 3223–3230.
- Only look once, mining distinctive landmarks from convnet for visual place recognition. In IROS, 9–16.
- ImageNet: A large-scale hierarchical image database. In CVPR, 248–255.
- DeTone, D.; et al. 2016. Deep image homography estimation. arXiv preprint arXiv:1606.03798.
- DeTone, D.; et al. 2018. Superpoint: Self-supervised interest point detection and description. In CVPR Workshops, 224–236.
- Scalable place recognition under appearance change for autonomous driving. In ICCV, 9319–9328.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR.
- Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6): 381–395.
- Long-term loop closure detection through visual-spatial information preserving multi-order graph matching. In AAAI, volume 34, 10369–10376.
- Improving condition-and environment-invariant place recognition with semantic place categorization. In IROS, 6863–6870.
- SeqNet: Learning descriptors for sequence-based hierarchical place recognition. IEEE Robotics and Automation Letters, 6(3): 4305–4312.
- Garg, S.; et al. 2018. Don’t look back: Robustifying place categorization for viewpoint-and condition-invariant place recognition. In ICRA, 3645–3652.
- Self-supervising fine-grained region similarities for large-scale image localization. In ECCV, 369–386. Springer.
- Visual place recognition using HMM sequence matching. In IROS, 4549–4555.
- Hartley, R.; et al. 2003. Multiple view geometry in computer vision. Cambridge university press.
- Escaping the big data paradigm with compact transformers. arXiv preprint arXiv:2104.05704.
- Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition. In CVPR, 14141–14152.
- Hierarchical multi-process fusion for visual place recognition. In ICRA, 3327–3333. IEEE.
- BoCNF: efficient image matching with Bag of ConvNet features for scalable and robust visual place recognition. Autonomous Robots, 42(6): 1169–1185.
- Learned contextual feature reweighting for image geo-localization. In CVPR, 2136–2145.
- Aggregating local descriptors into a compact image representation. In CVPR, 3304–3311.
- A Holistic Visual Place Recognition Approach Using Lightweight CNNs for Significant ViewPoint and Appearance Changes. IEEE Transactions on Robotics, 36(2): 561–569.
- Koguciuk, D.; et al. 2021. Perceptual loss for robust unsupervised homography estimation. In CVPR, 4274–4283.
- Deep homography estimation for dynamic scenes. In CVPR, 7652–7661.
- Densernet: Weakly supervised visual localization using multi-scale feature aggregation. In AAAI, volume 35, 6101–6109.
- Stochastic attraction-repulsion embedding for large scale image localization. In ICCV, 2570–2579.
- Lightweight, Viewpoint-Invariant Visual Place Recognition in Changing Environments. IEEE Robotics and Automation Letters, 3(2): 957–964.
- Visual place recognition: A survey. IEEE Transactions on Robotics, 32(1): 1–19.
- STA-VPR: Spatio-temporal alignment for visual place recognition. IEEE Robotics and Automation Letters, 6(3): 4297–4304.
- AANet: Aggregation and Alignment Network with Semi-hard Positive Sample Mining for Hierarchical Place Recognition. In ICRA, 11771–11778. IEEE.
- SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights. In ICRA, 1643–1649. IEEE.
- Semantics-aware visual localization under challenging perceptual conditions. In ICRA, 2614–2620.
- Robust visual robot localization across seasons using network flows. In AAAI, volume 28.
- Unsupervised deep homography: A fast and robust homography estimation model. IEEE Robotics and Automation Letters, 3(3): 2346–2353.
- Olid, D.; et al. 2018. Single-view place recognition under seasonal changes. arXiv preprint arXiv:1808.06516.
- Attentional pyramid pooling of salient visual residuals for place recognition. In ICCV, 885–894.
- Fine-tuning CNN image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence, 41(7): 1655–1668.
- Superglue: Learning feature matching with graph neural networks. In CVPR, 4938–4947.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
- Place Recognition With ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free. In RSS.
- Visual place recognition with repetitive structures. In CVPR, 883–890.
- TransVPR: Transformer-based place recognition with multi-level attention aggregation. In CVPR, 13648–13657.
- Mapillary street-level sequences: A dataset for lifelong place recognition. In CVPR, 2626–2635.
- Localizing Discriminative Visual Landmarks for Place Recognition. In ICRA, 5979–5985.
- A multi-domain feature learning method for visual place recognition. In ICRA, 319–324.
- Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE transactions on neural networks and learning systems, 31(2): 661–674.
- ETR: An Efficient Transformer for Re-ranking in Visual Place Recognition. In WACV, 5665–5674.
- Content-aware unsupervised deep homography estimation. In ECCV 2020, 653–669. Springer.
- Feng Lu (85 papers)
- Shuting Dong (7 papers)
- Lijun Zhang (239 papers)
- Bingxi Liu (10 papers)
- Xiangyuan Lan (25 papers)
- Dongmei Jiang (31 papers)
- Chun Yuan (127 papers)