AEGIS-Net: Attention-guided Multi-Level Feature Aggregation for Indoor Place Recognition (2312.09538v1)
Abstract: We present AEGIS-Net, a novel indoor place recognition model that takes in RGB point clouds and generates global place descriptors by aggregating lower-level color, geometry features and higher-level implicit semantic features. However, rather than simple feature concatenation, self-attention modules are employed to select the most important local features that best describe an indoor place. Our AEGIS-Net is made of a semantic encoder, a semantic decoder and an attention-guided feature embedding. The model is trained in a 2-stage process with the first stage focusing on an auxiliary semantic segmentation task and the second one on the place recognition task. We evaluate our AEGIS-Net on the ScanNetPR dataset and compare its performance with a pre-deep-learning feature-based method and five state-of-the-art deep-learning-based methods. Our AEGIS-Net achieves exceptional performance and outperforms all six methods.
- R. Arandjelović, P. Gronat, A. Torii, T. Pajdla and J. Sivic, “NetVLAD: CNN Architecture for Weakly Supervised Place Recognition,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 6, pp. 1437-1451, 2018.
- M. A. Uy and G. H. Lee, “PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4470-4479, 2018.
- S. Hausler, S. Garg, M. Xu, M. Milford and T. Fischer, “Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14141-14152, 2021.
- J. Komorowski, “MinkLoc3D: Point Cloud Based Large-Scale Place Recognition,” in IEEE Winter Conference on Applications of Computer Vision, pp. 1789-1798, 2021.
- S. Garg, N. Suenderhauf and M. Milford, “Semantic–geometric visual place recognition: a new perspective for reconciling opposing views,” in The International Journal of Robotics Research, vol. 41, no. 6, pp. 573-598, 2022.
- W. Maddern, G. Pascoe, C. Linegar and P. Newman, “1 Year, 1000 Km: The Oxford RobotCar Dataset,”, in International Journal of Robotics Research, vol. 36, no. 1, pp. 3–15, 2017.
- M. Måns Larsson, E. Stenborg, L. Hammarstrand, M. Pollefeys, T. Sattler and F. Kahl, “A Cross-Season Correspondence Dataset for Robust Semantic Segmentation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9524-9534, 2019.
- X. Yang, Y. Ming and A. Calway, “FD-SLAM: 3-D Reconstruction Using Features and Dense Matching,” in IEEE International Conference on Robotics and Automation, 2022.
- J. Du, R. Wang and D. Cremers, “DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization,” in European Conference on Computer Vision, 2020.
- F. Taubner, F. Tschopp, T. Novkovic, R. Siegwart and F. Furrer, “LCD – Line Clustering and Description for Place Recognition,” in International Conference on 3D Vision, pp. 908-917, 2020.
- M. Y. Chang, S. Yeon, S. Ryu and D. Lee, “SpoxelNet: Spherical Voxel-based Deep Place Recognition for 3D Point Clouds of Crowded Indoor Spaces,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 8564-8570, 2020.
- Y. Ming, X. Yang, G. Zhang and A. Calway, “CGiS-Net: Aggregating Colour, Geometry and Implicit Semantic Features for Indoor Place Recognition,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6991-6997, 2022.
- J. L. Schönberger, M. Pollefeys, A. Geiger and T. Sattler, “Semantic Visual Localization” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6896-6906, 2018.
- D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” in International Journal of Computer Vision, vol 60, no. 2, pp. 91-110, 2004.
- J. Sivic and A. Zisserman, “Efficient Visual Search of Videos Cast as Text Retrieval,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 4, pp. 591-606, 2009.
- B. Ramtoula, R. de Azambuja and G. Beltrame, “CAPRICORN: Communication Aware Place Recognition Using Interpretable Constellations of Objects in Robot Networks,” in IEEE International Conference on Robotics and Automation, pp. 8761-8768, 2020.
- Y. Ming, X. Yang and A. Calway, “Object-Augmented RGB-D SLAM for Wide-Disparity Relocalisation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2180-2186, 2021.
- D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” in International Conference on Learning Representations, 2015.