SFD2: Semantic-guided Feature Detection and Description (2304.14845v2)
Abstract: Visual localization is a fundamental task for various applications including autonomous driving and robotics. Prior methods focus on extracting large amounts of often redundant locally reliable features, resulting in limited efficiency and accuracy, especially in large-scale environments under challenging conditions. Instead, we propose to extract globally reliable features by implicitly embedding high-level semantics into both the detection and description processes. Specifically, our semantic-aware detector is able to detect keypoints from reliable regions (e.g. building, traffic lane) and suppress unreliable areas (e.g. sky, car) implicitly instead of relying on explicit semantic labels. This boosts the accuracy of keypoint matching by reducing the number of features sensitive to appearance changes and avoiding the need of additional segmentation networks at test time. Moreover, our descriptors are augmented with semantics and have stronger discriminative ability, providing more inliers at test time. Particularly, experiments on long-term large-scale visual localization Aachen Day-Night and RobotCar-Seasons datasets demonstrate that our model outperforms previous local features and gives competitive accuracy to advanced matchers but is about 2 and 3 times faster when using 2k and 4k keypoints, respectively.
- Night-to-day image translation for retrieval-based localization. In ICRA, 2019.
- NetVLAD: CNN architecture for weakly supervised place recognition. In CVPR, 2016.
- Three things everyone should know to improve object retrieval. In CVPR, 2012.
- Segnet: A deep convolutional encoder-decoder architecture for image segmentation. TPAMI, 2017.
- DSAC-differentiable RANSAC for camera localization. In CVPR, 2017.
- Learning less is more-6d camera localization via 3d surface regression. In CVPR, 2018.
- Large scale joint semantic re-localisation and scene understanding via globally unique instance coordinate regression. In BMVC, 2019.
- Learning to match features with seeded graph matching network. In ICCV, 2021.
- ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer. In ECCV, 2022.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.
- Cascaded parallel filtering for memory-efficient image-based localization. In ICCV, 2019.
- SIPs: Succinct interest points from unsupervised inlierness probability learning. In 3DV, 2019.
- Superpoint: Self-supervised interest point detection and description. In CVPRW, 2018.
- Cross-Descriptor Visual Localization and Mapping. In ICCV, 2021.
- D2-Net: A trainable CNN for joint description and detection of local features. In CVPR, 2019.
- Beyond cartesian representations for local descriptors. In ICCV, 2019.
- Sparse-to-dense hypercolumn matching for long-term visual localization. In 3DV, 2019.
- S2DNet: Learning accurate correspondences for sparse-to-dense feature matching. In ECCV, 2022.
- Generative adversarial networks. In NIPS, 2014.
- Knowledge distillation: A survey. IJCV, 2021.
- Predicting matchability. In CVPR, 2014.
- Local descriptors optimized for average precision. In CVPR, 2018.
- Deep residual learning for image recognition. In CVPR, 2016.
- DASGIL: Domain adaptation for semantic and geometric-aware image-based localization. TIP, 2020.
- VS-Net: Voting with segmentation for visual localization. In CVPR, 2021.
- Learned contextual feature reweighting for image geo-localization. In CVPR, 2017.
- Posenet: A convolutional network for real-time 6-dof camera relocalization. In ICCV, 2015.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- Matching features correctly through semantic understanding. In 3DV, 2014.
- Fine-grained segmentation networks: Self-supervised segmentation for improved long-term visual localization. In ICCV, 2019.
- Decoupling Makes Weakly Supervised Local Feature Better. In CVPR, 2022.
- Dual-resolution correspondence networks. In NeurIPS, 2020.
- A convnet for the 2020s. In CVPR, 2022.
- David G Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004.
- Contextdesc: Local descriptor augmentation with cross-modality context. In CVPR, 2019.
- ASLFeat: Learning local features of accurate shape and localization. In CVPR, 2020.
- Image matching from handcrafted to deep features: A survey. IJCV, 2021.
- 1 year, 1000 km: The Oxford RobotCar dataset. IJRR, 2017.
- DGCNet: Dense geometric correspondence network. In WACV, 2019.
- Working hard to know your neighbor’s margins: Local descriptor learning loss. In NeurIPS, 2017.
- Repeatability is not enough: Learning affine regions via discriminability. In ECCV, 2018.
- Explicit spatial encoding for deep local descriptors. In CVPR, 2019.
- Semantics-aware visual localization under challenging perceptual conditions. In ICRA, 2017.
- LF-Net: learning local features from images. In NeurIPS, 2018.
- Match or no match: Keypoint filtering based on matching probability. In CVPRW, 2020.
- Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147, 2016.
- Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
- Online invariance selection for local feature descriptors. In ECCV, 2020.
- PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors. In CVPR, 2022.
- R2D2: Repeatable and reliable detector and descriptor. In NeurIPS, 2019.
- Efficient neighbourhood consensus networks via submanifold sparse convolutions. In ECCV, 2020.
- ORB: An efficient alternative to SIFT or SURF. In ICCV, 2011.
- From Coarse to Fine: Robust Hierarchical Localization at Large Scale. In CVPR, 2019.
- Superglue: Learning feature matching with graph neural networks. In CVPR, 2020.
- Back to the feature: learning robust camera localization from pixels to pose. In CVPR, 2021.
- Efficient & effective prioritized for large-scale image-based localization. TPAMI, 2016.
- Benchmarking 6dof outdoor visual localization in changing conditions. In CVPR, 2018.
- Image retrieval for image-based localization revisited. In BMVC, 2012.
- Understanding the limitations of cnn-based absolute camera pose regression. In CVPR, 2019.
- Semantic visual localization. In CVPR, 2018.
- Visual localization using sparse semantic 3D map. In ICIP, 2019.
- ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching. In CVPR, 2022.
- Long-term visual localization using semantically segmented images. In ICRA, 2018.
- LoFTR: Detector-free local feature matching with transformers. In CVPR, 2021.
- Learning of low-level feature keypoints for accurate and robust detection. In WACV, 2021.
- City-scale localization for cameras with known vertical direction. TPAMI, 2016.
- D2D: Keypoint extraction with describe to detect approach. In ACCV, 2020.
- L2-net: Deep learning of discriminative patch descriptor in euclidean space. In CVPR, 2017.
- SOSNet: Second order similarity regularization for local descriptor learning. In CVPR, 2019.
- Semantic match consistency for long-term visual localization. In ECCV, 2018.
- Learning accurate dense correspondences and when to trust them. In CVPR, 2021.
- DISK: Learning local features with policy gradient. In NeurIPS, 2020.
- Learning feature descriptors using camera pose supervision. In ECCV, 2020.
- Localizing discriminative visual landmarks for place recognition. In ICRA, 2019.
- Imp: Iterative matching and pose estimation with adaptive pooling. In CVPR, 2023.
- Efficient Large-scale Localization by Global Instance Recognition. In CVPR, 2022.
- Local supports global: Deep camera relocalization with sequence enhancement. In ICCV, 2019.
- Learning multi-view camera relocalization with graph neural networks. In CVPR, 2020.
- Lift: Learned invariant feature transform. In ECCV, 2016.
- Knowledge as priors: Cross-modal knowledge generalization for datasets without superior knowledge. In CVPR, 2020.
- Semantic understanding of scenes through the ade20k dataset. IJCV, 2019.
- Patch2pix: Epipolar-guided pixel-level correspondences. In CVPR, 2021.