Ghost-dil-NetVLAD: A Lightweight Neural Network for Visual Place Recognition (2112.11679v2)
Abstract: Visual place recognition (VPR) is a challenging task with the unbalance between enormous computational cost and high recognition performance. Thanks to the practical feature extraction ability of the lightweight convolution neural networks (CNNs) and the train-ability of the vector of locally aggregated descriptors (VLAD) layer, we propose a lightweight weakly supervised end-to-end neural network consisting of a front-ended perception model called GhostCNN and a learnable VLAD layer as a back-end. GhostCNN is based on Ghost modules that are lightweight CNN-based architectures. They can generate redundant feature maps using linear operations instead of the traditional convolution process, making a good trade-off between computation resources and recognition accuracy. To enhance our proposed lightweight model further, we add dilated convolutions to the Ghost module to get features containing more spatial semantic information, improving accuracy. Finally, rich experiments conducted on a commonly used public benchmark and our private dataset validate that the proposed neural network reduces the FLOPs and parameters of VGG16-NetVLAD by 99.04% and 80.16%, respectively. Besides, both models achieve similar accuracy.
- S. Lowry, N. Sünderhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and M. J. Milford, “Visual place recognition: A survey,” IEEE Transactions on Robotics, vol. 32, no. 1, pp. 1–19, 2015.
- C. Masone and B. Caputo, “A survey on deep visual place recognition,” IEEE Access, vol. 9, pp. 19 516–19 547, 2021.
- M. Hussain, J. J. Bird, and D. R. Faria, “A study on cnn transfer learning for image classification,” in UK Workshop on computational Intelligence. Springer, 2018, pp. 191–202.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097–1105, 2012.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “Ghostnet: More features from cheap operations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1580–1589.
- W.-L. Ku, H.-C. Chou, and W.-H. Peng, “Discriminatively-learned global image representation using cnn as a local feature extractor for image retrieval,” in 2015 Visual Communications and Image Processing (VCIP). IEEE, 2015, pp. 1–4.
- R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5297–5307.
- X. Zhang, L. Wang, and Y. Su, “Visual place recognition: A survey from deep learning perspective,” Pattern Recognition, vol. 113, p. 107760, 2021.
- W. Zhou, H. Li, and Q. Tian, “Recent advance in content-based image retrieval: A literature survey,” arXiv preprint arXiv:1706.06064, 2017.
- R. Kapoor, D. Sharma, and T. Gulati, “State of the art content based image retrieval techniques using deep learning: a survey,” Multimedia Tools and Applications, pp. 1–23, 2021.
- A. Torii, J. Sivic, T. Pajdla, and M. Okutomi, “Visual place recognition with repetitive structures,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 883–890.
- Y. Zhong, R. Arandjelovi, and A. Zisserman, “Ghostvlad for set-based face recognition,” 2018.
- Y. Cao, J. Zhang, and J. Yu, “Image retrieval via gated multiscale netvlad for social media applications,” IEEE MultiMedia, vol. 27, no. 4, pp. 69–78, 2020.
- A. Khaliq, S. Ehsan, Z. Chen, M. Milford, and K. McDonald-Maier, “A holistic visual place recognition approach using lightweight cnns for significant viewpoint and appearance changes,” IEEE transactions on robotics, vol. 36, no. 2, pp. 561–569, 2019.
- A. Khaliq, S. Ehsan, M. Milford, and K. McDonald-Maier, “Camal: Context-aware multi-scale attention framework for lightweight visual place recognition,” 2019.
- A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky, “Neural codes for image retrieval,” in European conference on computer vision. Springer, 2014, pp. 584–599.
- M. Paulin, M. Douze, Z. Harchaoui, J. Mairal, F. Perronin, and C. Schmid, “Local convolutional features with unsupervised training for image retrieval,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 91–99.
- E. Mohedano, K. McGuinness, N. E. O’Connor, A. Salvador, F. Marques, and X. Giró-i Nieto, “Bags of local convolutional features for scalable instance search,” in Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016, pp. 327–331.
- J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, “Image classification with the fisher vector: Theory and practice,” International journal of computer vision, vol. 105, no. 3, pp. 222–245, 2013.
- B. Cao, A. Araujo, and J. Sim, “Unifying deep local and global features for image search,” in European Conference on Computer Vision. Springer, 2020, pp. 726–743.
- J. Yu, C. Zhu, J. Zhang, Q. Huang, and D. Tao, “Spatial pyramid-enhanced netvlad with weighted triplet loss for place recognition,” IEEE transactions on neural networks and learning systems, vol. 31, no. 2, pp. 661–674, 2019.
- Y. Ge, H. Wang, F. Zhu, R. Zhao, and H. Li, “Self-supervising fine-grained region similarities for large-scale image localization,” in European Conference on Computer Vision. Springer, 2020, pp. 369–386.
- S. Hausler, S. Garg, M. Xu, M. Milford, and T. Fischer, “Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 141–14 152.
- Y. Xu, J. Huang, J. Wang, Y. Wang, H. Qin, and K. Nan, “Esa-vlad: A lightweight network based on second-order attention and netvlad for loop closure detection,” IEEE Robotics and Automation Letters, 2021.
- A. Torii, R. Arandjelovic, J. Sivic, M. Okutomi, and T. Pajdla, “24/7 place recognition by view synthesis,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
- M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.
- A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan et al., “Searching for mobilenetv3,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1314–1324.
- X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6848–6856.
- N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 116–131.
- H. Jie, S. Li, and S. Gang, “Squeeze-and-excitation networks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- H. Jégou, M. Douze, C. Schmid, and P. Pérez, “Aggregating local descriptors into a compact image representation,” in 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2010, pp. 3304–3311.
- H. Jégou and O. Chum, “Negative evidences and co-occurences in image retrieval: The benefit of pca and whitening,” in European conference on computer vision. Springer, 2012, pp. 774–787.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” in CVPR09, 2009.
- K. He, R. Girshick, and P. Dollár, “Rethinking imagenet pre-training,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4918–4927.
- D. Hendrycks, K. Lee, and M. Mazeika, “Using pre-training can improve model robustness and uncertainty,” in International Conference on Machine Learning. PMLR, 2019, pp. 2712–2721.
- B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 million image database for scene recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 6, pp. 1452–1464, 2017.
- Qingyuan Gong (6 papers)
- Yu Liu (786 papers)
- Liqiang Zhang (20 papers)
- Renhe Liu (2 papers)