Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OS-FPI: A Coarse-to-Fine One-Stream Network for UAV Geo-Localization (2403.06148v1)

Published 10 Mar 2024 in eess.IV

Abstract: The geo-localization and navigation technology of unmanned aerial vehicles (UAVs) in denied environments is currently a prominent research area. Prior approaches mainly employed a two-stream network with non-shared weights to extract features from UAV and satellite images separately, followed by related modeling to obtain the response map. However, the two-stream network extracts UAV and satellite features independently. This approach significantly affects the efficiency of feature extraction and increases the computational load. To address these issues, we propose a novel coarse-to-fine one-stream network (OS-FPI). Our approach allows information exchange between UAV and satellite features during early image feature extraction. To improve the model's performance, the framework retains feature maps generated at different stages of the feature extraction process for the feature fusion network, and establishes additional connections between UAV and satellite feature maps in the feature fusion network. Additionally, the framework introduces offset prediction to further refine and optimize the model's prediction results based on the classification tasks. Our proposed model, boasts a similar inference speed to FPI while significantly reducing the number of parameters. It can achieve better performance with fewer parameters under the same conditions. Moreover, it achieves state-of-the-art performance on the UL14 dataset. Compared to previous models, our model achieved a significant 10.92-point improvement on the RDS metric, reaching 76.25. Furthermore, its performance in meter-level localization accuracy is impressive, with 182.62% improvement in 3-meter accuracy, 164.17% improvement in 5-meter accuracy, and 137.43% improvement in 10-meter accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Z. Huang, X. Yao, Y. Liu, C. O. Dumitru, M. Datcu, and J. Han, “Physically explainable cnn for sar image classification,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 190, pp. 25–37, 2022.
  2. Z. Huang, Y. Liu, X. Yao, J. Ren, and J. Han, “Uncertainty exploration: Toward explainable sar target detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–14, 2023.
  3. B. Zhang, L. Zhang, Y. Pang, P. North, M. Yan, H. Ren, L. Ruan, Z. Yang, and B. Chen, “Improved forest signal detection for space-borne photon-counting lidar using automatic machine learning,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 1–13, 2024.
  4. P. Zhou, P. Wang, J. Cao, D. Zhu, Q. Yin, J. Lv, P. Chen, Y. Jie, and C. Jiang, “Psfnet: Efficient detection of sar image based on petty-specialized feature aggregation,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 190–205, 2024.
  5. Z. Wen, X. Tang, G. Li, B. Ai, G. Wang, J. Yao, and F. Mo, “Sea surface signal extraction for photon-counting lidar data: A general method by dual-signal unmixing parameters,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 428–437, 2024.
  6. L. Li, X. Yao, X. Wang, D. Hong, G. Cheng, and J. Han, “Robust few-shot aerial image object detection via unbiased proposals filtration,” IEEE Transactions on Geoscience and Remote Sensing, 2023.
  7. J. Zhang, J. Lei, W. Xie, Z. Fang, Y. Li, and Q. Du, “Superyolo: Super resolution assisted object detection in multimodal remote sensing imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–15, 2023.
  8. W. Liu, K. Quijano, and M. M. Crawford, “Yolov5-tassel: Detecting tassels in rgb uav imagery with improved yolov5 based on transfer learning,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 8085–8094, 2022.
  9. L. Li, L. Wang, A. Du, and Y. Li, “Lrde-net: Large receptive field and image difference enhancement network for remote sensing images change detection,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 162–174, 2024.
  10. J. S. Shukla and R. J. Pandya, “Deep learning-oriented c-gan models for vegetative drought prediction on peninsular india,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 282–297, 2024.
  11. Z. Zheng, Y. Wei, and Y. Yang, “University-1652: A multi-view multi-source benchmark for drone-based geo-localization,” in Proceedings of the 28th ACM international conference on Multimedia, 2020, pp. 1395–1403.
  12. S. Cai, Y. Guo, S. Khan, J. Hu, and G. Wen, “Ground-to-aerial image geo-localization with a hard exemplar reweighting triplet loss,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8391–8400.
  13. Y. Zhu, B. Sun, X. Lu, and S. Jia, “Geographic semantic network for cross-view image geo-localization,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–15, 2021.
  14. X. Tian, J. Shao, D. Ouyang, and H. T. Shen, “Uav-satellite view synthesis for cross-view geo-localization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 7, pp. 4804–4815, 2021.
  15. M. Dai, J. Chen, Y. Lu, W. Hao, and E. Zheng, “Finding point with image: An end-to-end benchmark for vision-based uav localization,” arXiv preprint arXiv:2208.06561, 2022.
  16. G. Wang, J. Chen, M. Dai, and E. Zheng, “Wamf-fpi: A weight-adaptive multi-feature fusion network for uav localization,” Remote Sensing, vol. 15, no. 4, p. 910, 2023.
  17. R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5297–5307.
  18. J. Knopp, J. Sivic, and T. Pajdla, “Avoiding confusing features in place recognition,” in Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part I 11.   Springer, 2010, pp. 748–761.
  19. T. Sattler, T. Weyand, B. Leibe, and L. Kobbelt, “Image retrieval for image-based localization revisited.” in BMVC, vol. 1, no. 2, 2012, p. 4.
  20. S. Cao and N. Snavely, “Graph-based discriminative learning for location recognition,” in Proceedings of the ieee conference on computer vision and pattern recognition, 2013, pp. 700–707.
  21. H. Jin Kim, E. Dunn, and J.-M. Frahm, “Learned contextual feature reweighting for image geo-localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2136–2145.
  22. A. R. Zamir and M. Shah, “Image geo-localization based on multiplenearest neighbor feature matching usinggeneralized graphs,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 8, pp. 1546–1558, 2014.
  23. A. Torii, R. Arandjelovic, J. Sivic, M. Okutomi, and T. Pajdla, “24/7 place recognition by view synthesis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1808–1817.
  24. M. Zhai, Z. Bessinger, S. Workman, and N. Jacobs, “Predicting ground-level scene layout from aerial imagery,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 867–875.
  25. Z. Zeng, Z. Wang, F. Yang, and S. Satoh, “Geo-localization via ground-to-satellite cross-view image retrieval,” IEEE Transactions on Multimedia, 2022.
  26. T.-Y. Lin, Y. Cui, S. Belongie, and J. Hays, “Learning deep representations for ground-to-aerial geolocalization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5007–5015.
  27. S. Workman, R. Souvenir, and N. Jacobs, “Wide-area image geolocalization with aerial reference imagery,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3961–3969.
  28. S. Zhu, M. Shah, and C. Chen, “Transgeo: Transformer is all you need for cross-view image geo-localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1162–1171.
  29. Y. Shi, L. Liu, X. Yu, and H. Li, “Spatial-aware feature aggregation for image based cross-view geo-localization,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  30. A. Toker, Q. Zhou, M. Maximov, and L. Leal-Taixé, “Coming down to earth: Satellite-to-street view synthesis for geo-localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 6488–6497.
  31. G. Liu, C. Liu, and Y. Yuan, “Locate where you are by block joint learning network,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022.
  32. G. Berton, R. Mereu, G. Trivigno, C. Masone, G. Csurka, T. Sattler, and B. Caputo, “Deep visual geo-localization benchmark,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5396–5407.
  33. L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. Torr, “Fully-convolutional siamese networks for object tracking,” in Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14.   Springer, 2016, pp. 850–865.
  34. B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, “High performance visual tracking with siamese region proposal network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8971–8980.
  35. B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan, “Siamrpn++: Evolution of siamese visual tracking with very deep networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4282–4291.
  36. Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. Torr, “Fast online object tracking and segmentation: A unifying approach,” in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2019, pp. 1328–1338.
  37. X. Chen, B. Yan, J. Zhu, D. Wang, X. Yang, and H. Lu, “Transformer tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8126–8135.
  38. L. Lin, H. Fan, Z. Zhang, Y. Xu, and H. Ling, “Swintrack: A simple and strong baseline for transformer tracking,” Advances in Neural Information Processing Systems, vol. 35, pp. 16 743–16 754, 2022.
  39. C. Mayer, M. Danelljan, D. P. Paudel, and L. Van Gool, “Learning target candidate association to keep track of what not to track,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 13 444–13 454.
  40. B. Yan, H. Peng, J. Fu, D. Wang, and H. Lu, “Learning spatio-temporal transformer for visual tracking,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 448–10 457.
  41. Y. Cui, C. Jiang, L. Wang, and G. Wu, “Mixformer: End-to-end tracking with iterative mixed attention,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 608–13 618.
  42. B. Ye, H. Chang, B. Ma, S. Shan, and X. Chen, “Joint feature learning and relation modeling for tracking: A one-stream framework,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII.   Springer, 2022, pp. 341–357.
  43. X. Chu, Z. Tian, Y. Wang, B. Zhang, H. Ren, X. Wei, H. Xia, and C. Shen, “Twins: Revisiting the design of spatial attention in vision transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 9355–9366, 2021.
  44. X. Chu, Z. Tian, B. Zhang, X. Wang, X. Wei, H. Xia, and C. Shen, “Conditional positional encodings for vision transformers,” arXiv preprint arXiv:2102.10882, 2021.
  45. R. Yang, H. Ma, J. Wu, Y. Tang, X. Xiao, M. Zheng, and X. Li, “Scalablevit: Rethinking the context-oriented generalization of vision transformer,” in European Conference on Computer Vision.   Springer, 2022, pp. 480–496.
  46. W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 568–578.
  47. M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian, “A real-time algorithm for signal analysis with the help of the wavelet transform,” in Wavelets: Time-Frequency Methods and Phase Space Proceedings of the International Conference, Marseille, France, December 14–18, 1987.   Springer, 1990, pp. 286–297.
  48. A. Giusti, D. C. Cireşan, J. Masci, L. M. Gambardella, and J. Schmidhuber, “Fast image scanning with deep max-pooling convolutional neural networks,” in 2013 IEEE International Conference on Image Processing.   IEEE, 2013, pp. 4034–4038.
  49. P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” arXiv preprint arXiv:1312.6229, 2013.
  50. L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017.
  51. R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.
  52. S. Zhu, T. Yang, and C. Chen, “VIGOR: Cross-view image geo-localization beyond one-to-one retrieval,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).   IEEE, jun 2021. [Online]. Available: https://doi.org/10.1109%2Fcvpr46437.2021.00364
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jiahao Chen (89 papers)
  2. Enhui Zheng (5 papers)
  3. Ming Dai (9 papers)
  4. Yifu Chen (20 papers)
  5. Yusheng Lu (4 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.