Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Convolutional Cross-View Pose Estimation (2303.05915v3)

Published 9 Mar 2023 in cs.CV

Abstract: We propose a novel end-to-end method for cross-view pose estimation. Given a ground-level query image and an aerial image that covers the query's local neighborhood, the 3 Degrees-of-Freedom camera pose of the query is estimated by matching its image descriptor to descriptors of local regions within the aerial image. The orientation-aware descriptors are obtained by using a translationally equivariant convolutional ground image encoder and contrastive learning. The Localization Decoder produces a dense probability distribution in a coarse-to-fine manner with a novel Localization Matching Upsampling module. A smaller Orientation Decoder produces a vector field to condition the orientation estimate on the localization. Our method is validated on the VIGOR and KITTI datasets, where it surpasses the state-of-the-art baseline by 72% and 36% in median localization error for comparable orientation estimation accuracy. The predicted probability distribution can represent localization ambiguity, and enables rejecting possible erroneous predictions. Without re-training, the model can infer on ground images with different field of views and utilize orientation priors if available. On the Oxford RobotCar dataset, our method can reliably estimate the ego-vehicle's pose over time, achieving a median localization error under 1 meter and a median orientation error of around 1 degree at 14 FPS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. S. Thrun, “Probabilistic Robotics,” Communications of the ACM, vol. 45, no. 3, pp. 52–57, 2002.
  2. B. Ben-Moshe, E. Elkin et al., “Improving accuracy of GNSS devices in urban canyons,” in CCCG, 2011, pp. 511–515.
  3. C. Chen, B. Wang, C. X. Lu, N. Trigoni, and A. Markham, “A survey on deep learning for localization and mapping: Towards the age of spatial machine intelligence,” arXiv preprint arXiv:2006.12567, 2020.
  4. S. Lowry, N. Sünderhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and M. J. Milford, “Visual place recognition: A survey,” IEEE T-RO, vol. 32, no. 1, pp. 1–19, 2015.
  5. J. Janai, F. Güney, A. Behl, A. Geiger et al., “Computer vision for autonomous vehicles: Problems, datasets and state of the art,” Foundations and Trends® in Computer Graphics and Vision, vol. 12, no. 1–3, pp. 1–308, 2020.
  6. P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in Proc. of IEEE/CVF CVPR, 2019, pp. 12 716–12 725.
  7. I. A. Barsan, S. Wang, A. Pokrovsky, and R. Urtasun, “Learning to localize using a lidar intensity map,” in CoRL, 10 2018.
  8. X. Wei, I. A. Bârsan, S. Wang, J. Martinez, and R. Urtasun, “Learning to localize through compressed binary maps,” in Proc. of IEEE/CVF CVPR, 2019, pp. 10 316–10 324.
  9. W. Lu, Y. Zhou, G. Wan, S. Hou, and S. Song, “L3-Net: Towards learning based lidar localization for autonomous driving,” in Proc. of IEEE/CVF CVPR, 2019, pp. 6389–6398.
  10. H. Wang, C. Xue, Y. Zhou, F. Wen, and H. Zhang, “Visual semantic localization based on HD map for autonomous vehicles in urban scenarios,” in IEEE ICRA, 2021, pp. 11 255–11 261.
  11. C. Guo, M. Lin, H. Guo, P. Liang, and E. Cheng, “Coarse-to-fine semantic localization with HD map for autonomous driving in structural scenes,” in 2021 IEEE/RSJ IROS, 2021, pp. 1146–1153.
  12. T.-Y. Lin, Y. Cui, S. Belongie, and J. Hays, “Learning deep representations for ground-to-aerial geolocalization,” in Proc. of IEEE/CVF CVPR, 2015, pp. 5007–5015.
  13. S. Workman, R. Souvenir, and N. Jacobs, “Wide-area image geolocalization with aerial reference imagery,” in Proc. of IEEE/CVF ICCV, 2015, pp. 3961–3969.
  14. S. Workman and N. Jacobs, “On the location dependence of convolutional neural network features,” in Proc. of IEEE/CVF CVPR Workshops, 2015, pp. 70–78.
  15. Y. Shi, L. Liu, X. Yu, and H. Li, “Spatial-aware feature aggregation for image based cross-view geo-localization,” in NeurIPS, 2019, pp. 10 090–10 100.
  16. S. Hu, M. Feng, R. M. Nguyen, and G. Hee Lee, “CVM-net: Cross-view matching network for image-based ground-to-aerial geo-localization,” in Proc. of IEEE/CVF CVPR, 2018, pp. 7258–7267.
  17. K. Regmi and M. Shah, “Bridging the domain gap for ground-to-aerial image matching,” in Proc. of IEEE/CVF ICCV, 2019, pp. 470–479.
  18. S. Cai, Y. Guo, S. Khan, J. Hu, and G. Wen, “Ground-to-aerial image geo-localization with a hard exemplar reweighting triplet loss,” in Proc. of IEEE/CVF ICCV, 2019, pp. 8391–8400.
  19. Y. Shi, X. Yu, L. Liu et al., “Optimal feature transport for cross-view image geo-localization,” in Proc. of AAAI, 2020, pp. 11 990–11 997.
  20. S. Zhu, T. Yang, and C. Chen, “Revisiting street-to-aerial view image geo-localization and orientation estimation,” in Proc. of IEEE/CVF WACV, 2021, pp. 756–765.
  21. A. Toker, Q. Zhou, M. Maximov, and L. Leal-Taixe, “Coming down to earth: Satellite-to-street view synthesis for geo-localization,” in Proc. of IEEE/CVF CVPR, June 2021, pp. 6488–6497.
  22. S. Zhu, T. Yang, and C. Chen, “VIGOR: Cross-view image geo-localization beyond one-to-one retrieval,” in Proc. of IEEE/CVF CVPR, 2021, pp. 3640–3649.
  23. H. Yang, X. Lu, and Y. Zhu, “Cross-view geo-localization with layer-to-layer transformer,” in NeurIPS, 2021, pp. 29 009–29 020.
  24. S. Zhu, M. Shah, and C. Chen, “TransGeo: Transformer is all you need for cross-view image geo-localization,” in Proc. of IEEE/CVF CVPR, 2022, pp. 1162–1171.
  25. R. Rodrigues and M. Tani, “Global assists local: Effective aerial representations for field of view constrained image geo-localization,” in Proc. of IEEE/CVF WACV, 2022, pp. 3871–3879.
  26. Z. Xia, O. Booij, M. Manfredi, and J. F. P. Kooij, “Geographically local representation learning with a spatial prior for visual localization,” in ECCV Workshops.   Springer, 2020, pp. 557–573.
  27. Z. Xia, O. Booij, M. Manfredi, and J. F. P. Kooij, “Cross-view matching for vehicle localization by learning geographically local representations,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 5921–5928, 2021.
  28. Y. Shi and H. Li, “Beyond cross-view image retrieval: Highly accurate vehicle localization using satellite image,” in Proc. of IEEE/CVF CVPR, 2022, pp. 17 010–17 020.
  29. Z. Xia, O. Booij, M. Manfredi, and J. F. P. Kooij, “Visual cross-view metric localization with dense uncertainty estimates,” in ECCV.   Springer, 2022, pp. 90–106.
  30. S. Wang, Y. Zhang, and H. Li, “Satellite image based cross-view localization for autonomous vehicle,” arXiv preprint arXiv:2207.13506, 2022.
  31. Y. Hou, Y. Yang, J. Wang, and M. Fu, “Road extraction assisted offset regression method in cross-view image-based geo-localization,” in IEEE ITSC.   IEEE, 2022, pp. 2934–2940.
  32. T. Lentsch, Z. Xia, H. Caesar, and J. F. P. Kooij, “Slicematch: Geometry-guided aggregation for cross-view pose estimation,” in Proc. of IEEE/CVF CVPR, 2023, pp. 17 225–17 234.
  33. F. Fervers, S. Bullinger, C. Bodensteiner, M. Arens, and R. Stiefelhagen, “Uncertainty-aware vision-based metric cross-view geolocalization,” in Proc. of IEEE/CVF CVPR, 2023, pp. 21 621–21 631.
  34. T. G. Reid, S. E. Houts, R. Cammarata, G. Mills, S. Agarwal, A. Vora, and G. Pandey, “Localization requirements for autonomous vehicles,” SAE International Journal of Connected and Automated Vehicles, vol. 2, no. 12-02-03-0012, pp. 173–190, 2019.
  35. W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The Oxford RobotCar dataset,” IJRR, vol. 36, no. 1, pp. 3–15, 2017.
  36. H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuScenes: A multimodal dataset for autonomous driving,” in Proc. of IEEE/CVF CVPR, 2020, pp. 11 621–11 631.
  37. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset,” IJRR, 2013.
  38. L. Liu and H. Li, “Lending orientation to neural networks for cross-view geo-localization,” in Proc. of IEEE/CVF CVPR, 2019, pp. 5624–5633.
  39. Y. Shi, X. Yu, L. Liu, D. Campbell, P. Koniusz, and H. Li, “Accurate 3-DoF camera geo-localization via ground-to-satellite image matching,” IEEE T-PAMI, 2022.
  40. S. Li, Z. Tu, Y. Chen, and T. Yu, “Multi-scale attention encoder for street-to-aerial image geo-localization,” CAAI Transactions on Intelligence Technology, 2022.
  41. K. Regmi and A. Borji, “Cross-view image synthesis using conditional GANs,” in Proc. of IEEE/CVF CVPR, 2018, pp. 3501–3510.
  42. R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “NetVLAD: CNN architecture for weakly supervised place recognition,” in Proc. of IEEE/CVF CVPR, 2016, pp. 5297–5307.
  43. T. Wang, Z. Zheng, C. Yan, J. Zhang, Y. Sun, B. Zheng, and Y. Yang, “Each part matters: Local patterns facilitate cross-view geo-localization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 2, pp. 867–879, 2021.
  44. Y. Shi, X. Yu, S. Wang, and H. Li, “CVLNet: Cross-view semantic correspondence learning for video-based camera localization,” in Proc. of IEEE/CVF ACCV, 2022, pp. 652–669.
  45. N. N. Vo and J. Hays, “Localizing and orienting street views using overhead imagery,” in ECCV.   Springer, 2016, pp. 494–509.
  46. M. Zhai, Z. Bessinger, S. Workman, and N. Jacobs, “Predicting ground-level scene layout from aerial imagery,” in Proc. of IEEE/CVF CVPR, 2017, pp. 867–875.
  47. Y. Shi, X. Yu, D. Campbell, and H. Li, “Where am I looking at? joint location and orientation estimation by cross-view matching,” in Proc. of IEEE/CVF CVPR, 2020, pp. 4064–4072.
  48. S. Hu and G. H. Lee, “Image-based geo-localization using satellite imagery,” IJCV, pp. 1–15, 2019.
  49. L. M. Downes, D.-K. Kim, T. J. Steiner, and J. P. How, “City-wide street-to-satellite image geolocalization of a mobile ground agent,” in IEEE/RSJ IROS, 2022, pp. 11 102–11 108.
  50. L. M. Downes, T. J. Steiner, R. L. Russell, and J. P. How, “Wide-area geolocalization with a limited field of view camera,” arXiv preprint arXiv:2209.11854, 2022.
  51. W. Hu, Y. Zhang, Y. Liang, Y. Yin, A. Georgescu, A. Tran, H. Kruppa, S.-K. Ng, and R. Zimmermann, “Beyond geo-localization: Fine-grained orientation of street-view images by cross-view matching with satellite imagery,” in Proc. of ACM Multimedia, 2022, pp. 6155–6164.
  52. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in ICLR, 2020.
  53. H. Howard-Jenkins, J.-R. Ruiz-Sarmiento, and V. A. Prisacariu, “LaLaLoc: Latent layout localisation in dynamic, unvisited environments,” in Proc. of IEEE/CVF ICCV, 2021, pp. 10 107–10 116.
  54. H. Howard-Jenkins and V. A. Prisacariu, “LaLaLoc++: Global floor plan comprehension for layout localisation in unvisited environments,” in ECCV.   Springer, 2022, pp. 693–709.
  55. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in MICCAI.   Springer, 2015, pp. 234–241.
  56. Z. Min, N. Khosravan, Z. Bessinger, M. Narayana, S. B. Kang, E. Dunn, and I. Boyadzhiev, “LASER: Latent space rendering for 2d visual localization,” in Proc. of IEEE/CVF CVPR, 2022, pp. 11 122–11 131.
  57. A. Dosovitskiy et al., “FlowNet: Learning optical flow with convolutional networks,” in Proc. of IEEE/CVF ICCV, 2015.
  58. N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox, “A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation,” in Proc. of IEEE/CVF CVPR, 2016, pp. 4040–4048.
  59. D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “PWC-Net: Cnns for optical flow using pyramid, warping, and cost volume,” in Proc. of IEEE/CVF CVPR, 2018, pp. 8934–8943.
  60. A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
  61. G. Monge, “Mémoire sur la théorie des déblais et des remblais,” Mem. Math. Phys. Acad. Royale Sci., pp. 666–704, 1781.
  62. C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. A. Poggio, “Learning with a wasserstein loss,” in NeurIPS, vol. 28, 2015.
  63. W. Maddern, G. Pascoe et al., “Real-time kinematic ground truth for the Oxford RobotCar dataset,” arXiv preprint: 2002.10152, 2020.
  64. M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in ICML, 2019, pp. 6105–6114.
  65. J. Deng, W. Dong, R. Socher et al., “Imagenet: A large-scale hierarchical image database,” in Proc. of IEEE/CVF CVPR, 2009, pp. 248–255.
  66. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” ICLR, 2014.
  67. L. Wan, M. Zeiler, S. Zhang, Y. Le Cun, and R. Fergus, “Regularization of neural networks using dropconnect,” in ICML, 2013, pp. 1058–1066.
  68. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR, 2015.
  69. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proc. of IEEE/CVF ICCV, 2017, pp. 618–626.
Citations (13)

Summary

We haven't generated a summary for this paper yet.