Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pixel to Elevation: Learning to Predict Elevation Maps at Long Range using Images for Autonomous Offroad Navigation (2401.17484v3)

Published 30 Jan 2024 in cs.RO, cs.CV, and cs.LG

Abstract: Understanding terrain topology at long-range is crucial for the success of off-road robotic missions, especially when navigating at high-speeds. LiDAR sensors, which are currently heavily relied upon for geometric mapping, provide sparse measurements when mapping at greater distances. To address this challenge, we present a novel learning-based approach capable of predicting terrain elevation maps at long-range using only onboard egocentric images in real-time. Our proposed method is comprised of three main elements. First, a transformer-based encoder is introduced that learns cross-view associations between the egocentric views and prior bird-eye-view elevation map predictions. Second, an orientation-aware positional encoding is proposed to incorporate the 3D vehicle pose information over complex unstructured terrain with multi-view visual image features. Lastly, a history-augmented learn-able map embedding is proposed to achieve better temporal consistency between elevation map predictions to facilitate the downstream navigational tasks. We experimentally validate the applicability of our proposed approach for autonomous offroad robotic navigation in complex and unstructured terrain using real-world offroad driving data. Furthermore, the method is qualitatively and quantitatively compared against the current state-of-the-art methods. Extensive field experiments demonstrate that our method surpasses baseline models in accurately predicting terrain elevation while effectively capturing the overall terrain topology at long-ranges. Finally, ablation studies are conducted to highlight and understand the effect of key components of the proposed approach and validate their suitability to improve offroad robotic navigation capabilities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. B. Gao, A. Xu, Y. Pan, X. Zhao, W. Yao, and H. Zhao, “Off-road drivable area extraction using 3d lidar data,” in 2019 IEEE Intelligent Vehicles Symposium (IV), 2019, pp. 1505–1511.
  2. K. Ebadi et al., “Present and future of slam in extreme environments: The darpa subt challenge,” IEEE Transactions on Robotics, 2023.
  3. T. Overbye and S. Saripalli, “G-vom: A gpu accelerated voxel off-road mapping system,” in IEEE Intelligent Vehicles Symposium, 2022.
  4. T. Miki et al., “Elevation mapping for locomotion and navigation using gpu,” in IEEE International Conference on Intelligent Robots and Systems (IROS), 2022.
  5. J. Sock, J. Kim, J. Min, and K. Kwak, “Probabilistic traversability map generation using 3d-lidar and camera,” in IEEE International Conference on Robotics and Automation (ICRA), 2016.
  6. X. Meng et al., “Terrainnet: Visual modeling of complex terrain for high-speed, off-road navigation,” arXiv preprint arXiv:2303.15771, 2023.
  7. P. Krüsi, P. Furgale, M. Bosse, and R. Siegwart, “Driving on point clouds: Motion planning, trajectory optimization, and terrain assessment in generic nonplanar environments,” Journal of Field Robotics, vol. 34, no. 5, pp. 940–984, 2017.
  8. A. Shaban, X. Meng, J. Lee, B. Boots, and D. Fox, “Semantic terrain classification for off-road autonomous driving,” in Conference on Robot Learning.   PMLR, 2022, pp. 619–629.
  9. F. Ruetz, N. Lawrancel, E. Hernández, P. Borges, and T. Peynot, “Foresttrav: Accurate, efficient and deployable forest traversability estimation for autonomous ground vehicles,” arXiv preprint arXiv:2305.12705, 2023.
  10. I.-S. Kweon, M. Hebert, E. Krotkov, and T. Kanade, “Terrain mapping for a roving planetary explorer,” in IEEE International Conference on Robotics and Automation.   IEEE, 1989, pp. 997–1002.
  11. I.-S. Kweon and T. Kanade, “High-resolution terrain map from multiple sensor data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 278–292, 1992.
  12. J. Z. Kolter, Y. Kim, and A. Y. Ng, “Stereo vision and terrain modeling for quadruped robots,” in 2009 IEEE International Conference on Robotics and Automation.   IEEE, 2009, pp. 1557–1564.
  13. D. D. Fan et al., “Step: Stochastic traversability evaluation and planning for risk-aware off-road navigation,” arXiv preprint arXiv:2103.02828, 2021.
  14. T. Hines et al., “Virtual surfaces and attitude aware planning and behaviours for negative obstacle navigation,” IEEE Robotics and Automation Letters, 2021.
  15. R. Triebel, P. Pfaff, and W. Burgard, “Multi-level surface maps for outdoor terrain mapping and loop closing,” in IEEE International Conference on Intelligent Robots and Systems, 2006.
  16. M. Jaritz, R. De Charette, E. Wirbel, X. Perrotton, and F. Nashashibi, “Sparse and dense data with cnns: Depth completion and semantic segmentation,” in IEEE International Conference on 3D Vision, 2018.
  17. M. Hu, S. Wang, B. Li, S. Ning, L. Fan, and X. Gong, “Penet: Towards precise and efficient image guided depth completion,” in IEEE International Conference on Robotics and Automation (ICRA), 2021.
  18. W. Van Gansbeke, D. Neven, B. De Brabandere, and L. Van Gool, “Sparse and noisy lidar completion with rgb guidance and uncertainty,” in 2019 16th international conference on machine vision applications (MVA).   IEEE, 2019, pp. 1–6.
  19. Z. Qiu, L. Yue, and X. Liu, “Void filling of digital elevation models with a terrain texture learning model based on generative adversarial networks,” Remote Sensing, vol. 11, no. 23, p. 2829, 2019.
  20. M. Stölzle, T. Miki, L. Gerdes, M. Azkarate, and M. Hutter, “Reconstructing occluded elevation information in terrain maps with self-supervised learning,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1697–1704, 2022.
  21. H. Li et al., “Delving into the devils of bird’s-eye-view perception: A review, evaluation and recipe,” arXiv preprint arXiv:2209.05324, 2022.
  22. A. W. Harley, Z. Fang, J. Li, R. Ambrus, and K. Fragkiadaki, “Simple-bev: What really matters for multi-sensor bev perception?” in IEEE International Conference on Robotics and Automation (ICRA), 2023.
  23. Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, and J. Dai, “Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers,” in European conference on computer vision.   Springer, 2022, pp. 1–18.
  24. Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. L. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 2774–2781.
  25. W. Yang, Q. Li, W. Liu, Y. Yu, Y. Ma, S. He, and J. Pan, “Projecting your view attentively: Monocular road scene layout estimation via cross-view transformation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021.
  26. C.-F. R. Chen, Q. Fan, and R. Panda, “Crossvit: Cross-attention multi-scale vision transformer for image classification,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021.
  27. A. Saha, O. Mendez, C. Russell, and R. Bowden, “Translating images into maps,” in 2022 International conference on robotics and automation (ICRA).   IEEE, 2022, pp. 9200–9206.
  28. B. Zhou and P. Krähenbühl, “Cross-view transformers for real-time map-view semantic segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 13 760–13 769.
  29. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, 2017.
  30. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988.
  31. H. A. Aly and E. Dubois, “Image up-sampling using total-variation regularization with a new observation model,” IEEE Transactions on Image Processing, vol. 14, no. 10, pp. 1647–1659, 2005.
  32. M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning.   PMLR, 2019, pp. 6105–6114.
  33. S. Fakoorian et al., “Rose: Robust state estimation via online covariance adaption,” in The International Symposium of Robotics Research.   Springer, 2022, pp. 452–467.
  34. J. Nubert, S. Khattak, and M. Hutter, “Graph-based multi-sensor fusion for consistent localization of autonomous construction robots,” in IEEE International Conference on Robotics and Automation, 2022.
  35. Z. Li and N. Snavely, “Megadepth: Learning single-view depth prediction from internet photos,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2041–2050.
  36. W. Chen, Z. Fu, D. Yang, and J. Deng, “Single-image depth perception in the wild,” Advances in neural information processing systems, vol. 29, 2016.
  37. K. Mani et al., “Monolayout: Amodal scene layout from a single image,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020.
  38. J. Philion and S. Fidler, “Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d,” in Computer Vision–ECCV: 16th European Conference.   Springer, 2020.
Citations (9)

Summary

We haven't generated a summary for this paper yet.