Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RoadBEV: Road Surface Reconstruction in Bird's Eye View (2404.06605v3)

Published 9 Apr 2024 in cs.CV

Abstract: Road surface conditions, especially geometry profiles, enormously affect driving performance of autonomous vehicles. Vision-based online road reconstruction promisingly captures road information in advance. Existing solutions like monocular depth estimation and stereo matching suffer from modest performance. The recent technique of Bird's-Eye-View (BEV) perception provides immense potential to more reliable and accurate reconstruction. This paper uniformly proposes two simple yet effective models for road elevation reconstruction in BEV named RoadBEV-mono and RoadBEV-stereo, which estimate road elevation with monocular and stereo images, respectively. The former directly fits elevation values based on voxel features queried from image view, while the latter efficiently recognizes road elevation patterns based on BEV volume representing correlation between left and right voxel features. Insightful analyses reveal their consistence and difference with the perspective view. Experiments on real-world dataset verify the models' effectiveness and superiority. Elevation errors of RoadBEV-mono and RoadBEV-stereo achieve 1.83 cm and 0.50 cm, respectively. Our models are promising for practical road preview, providing essential information for promoting safety and comfort of autonomous vehicles. The code is released at https://github.com/ztsrxh/RoadBEV

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. H. Marzbani, H. Khayyam, C. N. TO, D. V. Quoc, and R. N. Jazar, “Autonomous vehicles: Autodriver algorithm and vehicle dynamics,” IEEE Transactions on Vehicular Technology, vol. 68, no. 4, pp. 3201–3211, 2019.
  2. T. Zhao, P. Guo, and Y. Wei, “Road friction estimation based on vision for safe autonomous driving,” Mechanical Systems and Signal Processing, vol. 208, p. 111019, 2024.
  3. H.-H. Jebamikyous and R. Kashef, “Autonomous vehicles perception (avp) using deep learning: Modeling, assessment, and challenges,” IEEE Access, vol. 10, pp. 10 523–10 535, 2022.
  4. T. Zhao, J. He, J. Lv, D. Min, and Y. Wei, “A comprehensive implementation of road surface classification for vehicle driving assistance: Dataset, models, and deployment,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 8, pp. 8361–8370, 2023.
  5. T. Zhao, P. Guo, J. He, and Y. Wei, “A hierarchical scheme of road unevenness perception with lidar for autonomous driving comfort,” IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 2439–2448, 2024.
  6. C. Göhrle, A. Schindler, A. Wagner, and O. Sawodny, “Road profile estimation and preview control for low-bandwidth active suspension systems,” IEEE/ASME Transactions on Mechatronics, vol. 20, no. 5, pp. 2299–2310, 2015.
  7. T. Ni, W. Li, D. Zhao, and Z. Kong, “Road profile estimation using a 3d sensor and intelligent vehicle,” Sensors, vol. 20, no. 13, p. 3676, 2020.
  8. L. Wang, D. Zhao, T. Ni, and S. Liu, “Extraction of preview elevation information based on terrain mapping and trajectory prediction in real-time,” IEEE Access, vol. 8, pp. 76 618–76 631, 2020.
  9. L. Sun, H. Zhang, and W. Yin, “Pseudo-lidar-based road detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 8, pp. 5386–5398, 2022.
  10. G.-T. Michailidis, R. Pajarola, and I. Andreadis, “High performance stereo system for dense 3-d reconstruction,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 6, pp. 929–941, 2014.
  11. R. Fan, J. Jiao, J. Pan, H. Huang, S. Shen, and M. Liu, “Real-time dense stereo embedded in a uav for road inspection,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019, pp. 535–543.
  12. T. Zhao, M. Ding, W. Zhan, M. Tomizuka, and Y. Wei, “Depth-aware volume attention for texture-less stereo matching,” arXiv preprint arXiv:2402.08931, 2024.
  13. H. Li, C. Sima, J. Dai, W. Wang, L. Lu, H. Wang, J. Zeng, Z. Li, J. Yang, H. Deng, H. Tian, E. Xie, J. Xie, L. Chen, T. Li, Y. Li, Y. Gao, X. Jia, S. Liu, J. Shi, D. Lin, and Y. Qiao, “Delving into the devils of bird’s-eye-view perception: A review, evaluation and recipe,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 4, pp. 2151–2170, 2024.
  14. A. Barrera, C. Guindel, J. Beltrán, and F. García, “Birdnet+: End-to-end 3d object detection in lidar bird’s eye view,” in 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), 2020, pp. 1–6.
  15. J. Wang, F. Li, Y. An, X. Zhang, and H. Sun, “Towards robust lidar-camera fusion in bev space via mutual deformable attention and temporal aggregation,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2024.
  16. T. Shen, G. Schamp, and M. Haddad, “Stereo vision based road surface preview,” in 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), 2014, pp. 1843–1849.
  17. B. Li, Y. Guo, J. Zhou, Y. Cai, J. Xiao, and W. Zeng, “Lane detection and road surface reconstruction based on multiple vanishing point & symposia,” in 2018 IEEE Intelligent Vehicles Symposium (IV), 2018, pp. 209–214.
  18. A. Dhiman, H.-J. Chien, and R. Klette, “A multi-frame stereo vision-based road profiling technique for distress analysis,” in 2018 15th International Symposium on Pervasive Systems, Algorithms and Networks (I-SPAN), 2018, pp. 7–14.
  19. B. Jia, J. Chen, and K. Zhang, “Drivable road reconstruction for intelligent vehicles based on two-view geometry,” IEEE Transactions on Industrial Electronics, vol. 64, no. 5, pp. 3696–3706, 2017.
  20. Y. Feng, R. Zhang, and S. Zhai, “Road elevation map estimation based on affine transformation and stereo matching,” in Journal of Physics: Conference Series, vol. 1601, no. 6, 2020, p. 062015.
  21. D. Li and T. Furukawa, “Global vision-based reconstruction of three-dimensional road surfaces using adaptive extended kalman filter,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 3860–3866.
  22. R. Mei, W. Sui, J. Zhang, Q. Zhang, T. Peng, and C. Yang, “Rome: Towards large scale road surface reconstruction via mesh representation,” arXiv preprint arXiv:2306.11368, 2023.
  23. W. Wu, Q. Wang, G. Wang, J. Wang, T. Zhao, Y. Liu, D. Gao, Z. Liu, and H. Wang, “Emie-map: Large-scale road surface reconstruction based on explicit mesh and implicit encoding,” arXiv preprint arXiv:2403.11789, 2024.
  24. L. Wang, X. Zhang, Z. Song, J. Bi, G. Zhang, H. Wei, L. Tang, L. Yang, J. Li, C. Jia, et al., “Multi-modal 3d object detection in autonomous driving: A survey and taxonomy,” IEEE Transactions on Intelligent Vehicles, 2023.
  25. H. Li, C. Sima, J. Dai, W. Wang, L. Lu, H. Wang, J. Zeng, Z. Li, J. Yang, H. Deng, et al., “Delving into the devils of bird’s-eye-view perception: A review, evaluation and recipe,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  26. Z. Song, L. Yang, S. Xu, L. Liu, D. Xu, C. Jia, F. Jia, and L. Wang, “Graphbev: Towards robust bev feature alignment for multi-modal 3d object detection,” arXiv preprint arXiv:2403.11848, 2024.
  27. J. Huang, G. Huang, Z. Zhu, and D. Du, “Bevdet: High-performance multi-camera 3d object detection in bird-eye-view,” arXiv preprint arXiv:2112.11790, 2021.
  28. Y. Li, Z. Ge, G. Yu, J. Yang, Z. Wang, Y. Shi, J. Sun, and Z. Li, “Bevdepth: Acquisition of reliable depth for multi-view 3d object detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 1477–1485.
  29. Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, and J. Dai, “Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers,” in European conference on computer vision.   Springer, 2022, pp. 1–18.
  30. L. Yang, K. Yu, T. Tang, J. Li, K. Yuan, L. Wang, X. Zhang, and P. Chen, “Bevheight: A robust framework for vision-based roadside 3d object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21 611–21 620.
  31. L. Yang, T. Tang, J. Li, P. Chen, K. Yuan, L. Wang, Y. Huang, X. Zhang, and K. Yu, “Bevheight++: Toward robust visual centric 3d object detection,” arXiv preprint arXiv:2309.16179, 2023.
  32. W. Tong, C. Sima, T. Wang, L. Chen, S. Wu, H. Deng, Y. Gu, L. Lu, P. Luo, D. Lin, et al., “Scene as occupancy,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8406–8415.
  33. Y. Huang, W. Zheng, Y. Zhang, J. Zhou, and J. Lu, “Tri-perspective view for vision-based 3d semantic occupancy prediction,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 9223–9232.
  34. Y. Wang, H. Pan, J. Zhu, Y.-H. Wu, X. Zhan, K. Jiang, and D. Yang, “Be-sti: Spatial-temporal integrated network for class-agnostic motion prediction with bidirectional enhancement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 093–17 102.
  35. Q. Li, Y. Wang, Y. Wang, and H. Zhao, “Hdmapnet: An online hd map construction and evaluation framework,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 4628–4634.
  36. J. Philion and S. Fidler, “Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d,” in Proceedings of the European Conference on Computer Vision, 2020.
  37. C. Reading, A. Harakeh, J. Chae, and S. L. Waslander, “Categorical depth distribution network for monocular 3d object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8555–8564.
  38. J. Huang and G. Huang, “Bevdet4d: Exploit temporal cues in multi-camera 3d object detection,” arXiv preprint arXiv:2203.17054, 2022.
  39. Y. Li, H. Bao, Z. Ge, J. Yang, J. Sun, and Z. Li, “Bevstereo: Enhancing depth estimation in multi-view 3d object detection with dynamic temporal stereo,” arXiv preprint arXiv:2209.10248, 2022.
  40. T. Zhao, C. Xu, M. Ding, M. Tomizuka, W. Zhan, and Y. Wei, “Rsrd: A road surface reconstruction dataset and benchmark for safe and comfortable autonomous driving,” arXiv preprint arXiv:2310.02262, 2023.
  41. M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning.   PMLR, 2019, pp. 6105–6114.
  42. E. Xie, Z. Yu, D. Zhou, J. Philion, A. Anandkumar, S. Fidler, P. Luo, and J. M. Alvarez, “M2bev: Multi-camera joint 3d detection and segmentation with unified birds-eye view representation,” arXiv preprint arXiv:2204.05088, 2022.
  43. S. Farooq Bhat, I. Alhashim, and P. Wonka, “Adabins: Depth estimation using adaptive bins,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 4008–4017.
  44. M. Song, S. Lim, and W. Kim, “Monocular depth estimation using laplacian pyramid-based depth residuals,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 11, pp. 4381–4393, 2021.
  45. A. Agarwal and C. Arora, “Attention attention everywhere: Monocular depth prediction with skip attention,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2023, pp. 5861–5870.
  46. L. Piccinelli, C. Sakaridis, and F. Yu, “idisc: Internal discretization for monocular depth estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  47. G. Xu, X. Wang, X. Ding, and X. Yang, “Iterative geometry encoding volume for stereo matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21 919–21 928.
  48. J.-R. Chang and Y.-S. Chen, “Pyramid stereo matching network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5410–5418.
  49. Z. Shen, Y. Dai, and Z. Rao, “Cfnet: Cascade and fused cost volume for robust stereo matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 13 906–13 915.
  50. G. Xu, J. Cheng, P. Guo, and X. Yang, “Attention concatenation volume for accurate and efficient stereo matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 981–12 990.
  51. X. Guo, K. Yang, W. Yang, X. Wang, and H. Li, “Group-wise correlation stereo network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3273–3282.
  52. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
  53. B. Kerbl, G. Kopanas, T. Leimkuehler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” ACM Trans. Graph., vol. 42, no. 4, pp. 1–14, jul 2023.
Citations (8)

Summary

  • The paper presents RoadBEV-mono and RoadBEV-stereo, two models that use BEV perspectives to accurately estimate road elevations from monocular and stereo images.
  • The methodology leverages a classification approach for monocular images and a 4D cost volume for stereo data, reducing elevation errors to 1.83 cm and 0.56 cm respectively.
  • The results demonstrate that BEV-based road surface reconstruction enhances autonomous vehicle perception and opens new avenues for advanced 3D reconstruction research.

Overview of RoadBEV: Road Surface Reconstruction in Bird's Eye View

The paper "RoadBEV: Road Surface Reconstruction in Bird's Eye View" presents developments in modeling road surfaces to contribute to the perception systems of autonomous vehicles. The paper is primarily centered on leveraging Bird's-Eye-View (BEV) perception for the reconstruction of road surfaces using monocular and stereo images. The authors introduce two models, RoadBEV-mono and RoadBEV-stereo, each dedicated to estimating road elevation using monocular and stereo imagery, respectively.

Key Features and Models

In response to inherent limitations of monocular depth estimation and stereo matching in perspective view, the authors propose BEV-based models for road elevation estimation, which is a more aligned approach to account for road surface variation patterns.

  • RoadBEV-mono: This model adopts BEV perception to regress elevation values from a monocular image. It utilizes a classification approach that facilitates the identification of road surface features directly in the elevation direction rather than the traditional depth direction. This technique not only aligns with the BEV perspective but also significantly improves estimation performance over traditional monocular depth estimations.
  • RoadBEV-stereo: For stereo configurations, a 4D cost volume in BEV is constructed using the discrepancy between left and right voxel features. This construction effectively focuses on recognizing discrepancies in feature queries which are critical for accurate road surface perception. This approach leads to higher precision in recognizing fine road undulations compared to traditional stereo matching techniques.

Experimental Findings

The models were evaluated on a real-world dataset, and their performance was benchmarked against existing methods. Notably, RoadBEV-mono achieved an elevation error of 1.83 cm, and RoadBEV-stereo significantly reduced this error to 0.56 cm. This indicates a substantial improvement over existing methodologies, especially the traditional monocular depth estimation technique, which saw a 50% improvement in BEV with a monocular setup.

Implications and Future Directions

The findings from this work have compelling implications for autonomous vehicle technology. The proposed BEV models address critical challenges in road surface perception by focusing directly on elevation variations, a direction that complies better with actual road profiles. This method provides a practical advantage in real-time applications on autonomous systems, where accurate road surface information can significantly enhance motion planning and control, thereby improving passenger comfort and vehicle safety.

From a theoretical standpoint, the transition from perspective to BEV in core road reconstruction tasks lays the groundwork for new approaches in other 3D reconstruction tasks, such as texture and geometry recovery. Future research might involve integrating more comprehensive multi-view and multi-sensor data to enhance the robustness of BEV perception models. Additionally, further algorithmic advancements could focus on refining voxel queries and optimizing deep learning architectures for improved accuracy and efficiency.

Overall, while challenges remain in scaling these models and integrating them into broader autonomous vehicle systems, the research presented in this paper represents a significant step forward in leveraging BEV perception for road surface estimation. The development of RoadBEV provides both pragmatic solutions and a strategic pathway for future investigations in road surface reconstruction and allied autonomous driving technologies.

Github Logo Streamline Icon: https://streamlinehq.com