TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation (2404.01587v1)
Abstract: Visual place recognition (VPR) plays a pivotal role in autonomous exploration and navigation of mobile robots within complex outdoor environments. While cost-effective and easily deployed, camera sensors are sensitive to lighting and weather changes, and even slight image alterations can greatly affect VPR efficiency and precision. Existing methods overcome this by exploiting powerful yet large networks, leading to significant consumption of computational resources. In this paper, we propose a high-performance teacher and lightweight student distillation framework called TSCM. It exploits our devised cross-metric knowledge distillation to narrow the performance gap between the teacher and student models, maintaining superior performance while enabling minimal computational load during deployment. We conduct comprehensive evaluations on large-scale datasets, namely Pittsburgh30k and Pittsburgh250k. Experimental results demonstrate the superiority of our method over baseline models in terms of recognition accuracy and model parameter efficiency. Moreover, our ablation studies show that the proposed knowledge distillation technique surpasses other counterparts. The code of our method has been released at https://github.com/nubot-nudt/TSCM.
- H. Yin, X. Xu, S. Lu, X. Chen, R. Xiong, S. Shen, C. Stachniss, and Y. Wang, “A survey on global lidar localization: Challenges, advances and open problems,” 2023.
- P. Yin, S. Zhao, I. Cisneros, A. Abuduweili, G. Huang, M. Milford, C. Liu, H. Choset, and S. Scherer, “General place recognition survey: Towards the real-world autonomy age,” 2022.
- X. Chen, T. Läbe, A. Milioto, T. Röhling, O. Vysotska, A. Haag, J. Behley, and C. Stachniss, “OverlapNet: Loop Closing for LiDAR-based SLAM,” in Proc. of Robotics: Science and Systems (RSS), 2020. [Online]. Available: https://www.ipb.uni-bonn.de/wp-content/papercite-data/pdf/chen2020rss.pdf
- J. Ma, J. Zhang, J. Xu, R. Ai, W. Gu, and X. Chen, “Overlaptransformer: An efficient and yaw-angle-invariant transformer network for lidar-based place recognition,” IEEE Robotics and Automation Letters (RA-L), vol. 7, no. 3, pp. 6958–6965, 2022.
- J. Ma, X. Chen, J. Xu, and G. Xiong, “Seqot: A spatial–temporal transformer network for place recognition using sequential lidar data,” IEEE Trans. on Industrial Electronics (TIE), vol. 70, no. 8, pp. 8225–8234, 2022.
- X. Zhang, L. Wang, and Y. Su, “Visual place recognition: A survey from deep learning perspective,” Pattern Recognition, vol. 113, p. 107760, 2021.
- C. Masone and B. Caputo, “A survey on deep visual place recognition,” IEEE Access, vol. 9, pp. 19 516–19 547, 2021.
- S. Hausler, S. Garg, M. Xu, M. Milford, and T. Fischer, “Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 14 141–14 152.
- J. Yu, C. Zhu, J. Zhang, Q. Huang, and D. Tao, “Spatial pyramid-enhanced netvlad with weighted triplet loss for place recognition,” IEEE Trans. on neural networks and learning systems(TNNLS), vol. 31, no. 2, pp. 661–674, 2019.
- L. Hui, M. Cheng, J. Xie, J. Yang, and M.-M. Cheng, “Efficient 3d point cloud feature learning for large-scale place recognition,” IEEE Trans. on Image Processing, vol. 31, pp. 1258–1270, 2022.
- K. Cai, C. X. Lu, and X. Huang, “Stun: Self-teaching uncertainty estimation for place recognition,” in Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 6614–6621.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016. [Online]. Available: https://arxiv.org/pdf/1512.03385
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” 2021.
- J. Ma, G. Xiong, J. Xu, and X. Chen, “CVTNet: A Cross-View Transformer Network for LiDAR-Based Place Recognition in Autonomous Driving Environments,” IEEE Trans. on Industrial Informatics (TII), 2023.
- X. Chen, T. Läbe, A. Milioto, T. Röhling, J. Behley, and C. Stachniss, “OverlapNet: A Siamese Network for Computing LiDAR Scan Similarity with Applications to Loop Closing and Localization,” Autonomous Robots, 2021. [Online]. Available: http://www.ipb.uni-bonn.de/pdfs/chen2021auro.pdf
- Y. Wang, Z. Sun, C.-Z. Xu, S. E. Sarma, J. Yang, and H. Kong, “Lidar iris for loop-closure detection,” in Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 5769–5775.
- S. Siam and H. Zhang, “Fast-SeqSLAM: A Fast Appearance Based Place Recognition Algorithm,” in Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 2017.
- M. Milford and G. Wyeth, “SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights.” in Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 2012.
- O. Vysotska and C. Stachniss, “Effective Visual Place Recognition Using Multi-Sequence Maps,” IEEE Robotics and Automation Letters (RA-L), vol. 4, pp. 1730–1736, 2019. [Online]. Available: http://www.ipb.uni-bonn.de/pdfs/vysotska2019ral.pdf
- R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5297–5307.
- J. Yu, C. Zhu, J. Zhang, Q. Huang, and D. Tao, “Spatial pyramid-enhanced netvlad with weighted triplet loss for place recognition,” IEEE Trans. on Neural Networks and Learning Systems(TNNLS), vol. 31, no. 2, pp. 661–674, 2020.
- A. Khaliq, S. Ehsan, Z. Chen, M. Milford, and K. McDonald-Maier, “A holistic visual place recognition approach using lightweight cnns for significant viewpoint and appearance changes,” IEEE Trans. on Robotics (TRO), vol. 36, no. 2, pp. 561–569, 2020.
- Z. Li, C. D. W. Lee, B. X. L. Tung, Z. Huang, D. Rus, and M. H. Ang, “Hot-netvlad: Learning discriminatory key points for visual place recognition,” IEEE Robotics and Automation Letters (RA-L), vol. 8, no. 2, pp. 974–980, 2023.
- A. Khaliq, M. Milford, and S. Garg, “Multires-netvlad: Augmenting place recognition training with low-resolution imagery,” IEEE Robotics and Automation Letters (RA-L), vol. 7, no. 2, pp. 3882–3889, 2022.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Proc. of the Advances in Neural Information Processing Systems (NIPS), vol. 30, 2017.
- Y. Wang, Y. Qiu, P. Cheng, and J. Zhang, “Hybrid cnn-transformer features for visual place recognition,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 33, no. 3, pp. 1109–1122, 2023.
- Z. Hou, Y. Yan, C. Xu, and H. Kong, “Hitpr: Hierarchical transformer for place recognition in point cloud,” in Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 2022, pp. 2612–2618.
- G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint, 2015.
- J. H. Cho and B. Hariharan, “On the efficacy of knowledge distillation,” in Proc. of the IEEE/CVF Intl. Conf. on Computer Vision (ICCV), 2019, pp. 4794–4802.
- W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3967–3976.
- B. Zhao, Q. Cui, R. Song, Y. Qiu, and J. Liang, “Decoupled knowledge distillation,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 11 953–11 962.
- H. Oki, M. Abe, J. Miyao, and T. Kurita, “Triplet loss for knowledge distillation,” in 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020, pp. 1–7.
- Y. Shen, S. Zhou, J. Fu, R. Wang, S. Chen, and N. Zheng, “Structvpr: Distill structural knowledge with weighting samples for visual place recognition,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 11 217–11 226.
- J. Cui and X. Chen, “CCL: Continual contrastive learning for LiDAR place recognition,” IEEE Robotics and Automation Letters (RA-L), vol. 8, no. 8, pp. 4433–4440, 2023.
- E. Hoffer and N. Ailon, “Deep metric learning using triplet network,” in International Workshop on Similarity-Based Pattern Recognition, 2015, pp. 84–92.
- A. Torii, J. Sivic, M. Okutomi, and T. Pajdla, “Visual place recognition with repetitive structures,” IEEE Trans. on Pattern Analalysis and Machine Intelligence (TPAMI), vol. 37, no. 11, pp. 2346–2359, 2015.
- A. Taha, Y.-T. Chen, X. Yang, T. Misu, and L. Davis, “Exploring uncertainty in conditional multi-modal retrieval systems,” arXiv preprint, 2019.
- Y. Shi and A. K. Jain, “Probabilistic face embeddings,” in Proc. of the IEEE/CVF Intl. Conf. on Computer Vision (ICCV), 2019, pp. 6902–6911.
- F. Warburg, M. Jørgensen, J. Civera, and S. Hauberg, “Bayesian triplet loss: Uncertainty quantification in image retrieval,” in Proc. of the IEEE/CVF Intl. Conf. on Computer Vision (ICCV), 2021, pp. 12 158–12 168.