Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Online Vectorized HD Map Construction using Geometry (2312.03341v2)

Published 6 Dec 2023 in cs.CV and cs.AI

Abstract: The construction of online vectorized High-Definition (HD) maps is critical for downstream prediction and planning. Recent efforts have built strong baselines for this task, however, shapes and relations of instances in urban road systems are still under-explored, such as parallelism, perpendicular, or rectangle-shape. In our work, we propose GeMap ($\textbf{Ge}$ometry $\textbf{Map}$), which end-to-end learns Euclidean shapes and relations of map instances beyond basic perception. Specifically, we design a geometric loss based on angle and distance clues, which is robust to rigid transformations. We also decouple self-attention to independently handle Euclidean shapes and relations. Our method achieves new state-of-the-art performance on the NuScenes and Argoverse 2 datasets. Remarkably, it reaches a 71.8% mAP on the large-scale Argoverse 2 dataset, outperforming MapTR V2 by +4.4% and surpassing the 70% mAP threshold for the first time. Code is available at https://github.com/cnzzx/GeMap.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
  2. Efficient and robust 2d-to-bev representation learning via geometry-guided kernel transformer. arXiv preprint arXiv:2206.04584, 2022.
  3. Multimodal trajectory prediction conditioned on lane-graph traversals. In Conference on Robot Learning, pages 203–212. PMLR, 2022.
  4. Pivotnet: Vectorized pivot learning for end-to-end hd map construction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3672–3682, 2023.
  5. Deep interactive motion prediction and planning: Playing games with motion prediction models. In Learning for Dynamics and Control Conference, pages 1006–1019. PMLR, 2022.
  6. Rethinking efficient lane detection via curve modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17062–17070, 2022.
  7. Vectornet: Encoding hd maps and agent dynamics from vectorized representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11525–11533, 2020.
  8. Metabev: Solving sensor failures for 3d detection and map segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8721–8731, 2023.
  9. Vip3d: End-to-end visual trajectory prediction via 3d agent queries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5496–5506, 2023.
  10. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  11. Jialin Jiao. Machine learning assisted high-definition map creation. In 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), pages 367–373. IEEE, 2018.
  12. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12697–12705, 2019.
  13. An energy and gpu-computation efficient backbone network for real-time object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 0–0, 2019.
  14. Hdmapnet: An online hd map construction and evaluation framework. In 2022 International Conference on Robotics and Automation (ICRA), pages 4628–4634. IEEE, 2022a.
  15. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1477–1485, 2023.
  16. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In European conference on computer vision, pages 1–18. Springer, 2022b.
  17. Learning lane graph representations for motion forecasting. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 541–556. Springer, 2020.
  18. MapTR: Structured modeling and learning for online vectorized HD map construction. In The Eleventh International Conference on Learning Representations, 2023a.
  19. Maptrv2: An end-to-end framework for online vectorized hd map construction. arXiv preprint arXiv:2308.05736, 2023b.
  20. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
  21. Vectormapnet: End-to-end vectorized hd map learning. In International Conference on Machine Learning, pages 22352–22369. PMLR, 2023a.
  22. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  23. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 2774–2781. IEEE, 2023b.
  24. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  25. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  26. Driving among flatmobiles: Bird-eye-view occupancy grids from a monocular camera for holistic trajectory planning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 51–60, 2021.
  27. Monocular semantic occupancy grid mapping with convolutional variational encoder–decoder networks. IEEE Robotics and Automation Letters, 4(2):445–452, 2019.
  28. Hdmapgen: A hierarchical graph generative model of high definition maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4227–4236, 2021.
  29. Cross-view semantic segmentation for sensing surroundings. IEEE Robotics and Automation Letters, 5(3):4867–4873, 2020.
  30. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pages 194–210. Springer, 2020.
  31. End-to-end vectorized hd-map construction with piecewise bezier curve. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13218–13228, 2023.
  32. Urban driver: Learning to drive from real-world demonstrations using policy gradients. In Conference on Robot Learning, pages 718–728. PMLR, 2022.
  33. Keep your eyes on the lane: Real-time attention-guided lane detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 294–302, 2021a.
  34. Polylanenet: Lane estimation via deep polynomial regression. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 6150–6156. IEEE, 2021b.
  35. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
  36. End-to-end lane detection through differentiable least-squares fitting. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019.
  37. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  38. A keypoint-based global association network for lane detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1392–1401, 2022.
  39. Argoverse 2: Next generation datasets for self-driving perception and forecasting. arXiv preprint arXiv:2301.00493, 2023.
  40. Mv-map: Offboard hd-map generation with multi-view consistency. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8658–8668, 2023.
  41. Second: Sparsely embedded convolutional detection. Sensors, 18(10):3337, 2018.
  42. Projecting your view attentively: Monocular road scene layout estimation via cross-view transformation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15536–15545, 2021.
  43. Streammapnet: Streaming mapping network for vectorized online hd map construction. arXiv preprint arXiv:2308.12570, 2023.
  44. Online map vectorization for autonomous driving: A rasterization perspective. arXiv preprint arXiv:2306.10502, 2023.
  45. Cross-view transformers for real-time map-view semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13760–13769, 2022.
  46. Hivt: Hierarchical vector transformer for multi-agent motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8823–8833, 2022.
  47. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020.
Citations (15)

Summary

  • The paper introduces GeMap, a novel framework utilizing geometric loss functions and geometry-decoupled attention to enhance online HD map construction.
  • The approach employs Euclidean shape and relation clues, BEV feature extraction, and rotation-invariant representations to achieve superior mapping accuracy.
  • Experimental results show GeMap achieving 71.8% mAP on Argoverse 2, marking a 4.4% improvement over existing methods and setting a new benchmark for autonomous systems.

An In-depth Review of GeMap: A Framework for Online Vectorized HD Map Construction

The paper, "Online Vectorized HD Map Construction using Geometry," proposes a novel approach, GeMap, for the efficient construction of high-definition vectorized maps, a task critical for autonomous driving systems. The focus of this research is on deciphering and leveraging the inherent geometric properties present in urban road systems beyond basic perception techniques typically used in existing methodologies.

GeMap Framework

GeMap introduces an innovative framework that learns Euclidean shapes and spatial relations of map instances, thereby allowing for more effective map construction. The framework comprises several key components:

  1. Geometric Loss: The creation of an advanced geometric loss that relies on angle and distance clues to provide robustness against rigid transformations is central to this work. This loss function, known as Euclidean Loss, incorporates two main elements:
    • Euclidean Shape Clues, which focus on the shape of each map instance.
    • Euclidean Relation Clues, which capture the relational properties between multiple map instances.
  2. Geometry-Decoupled Attention (GDA): Another significant innovation is the adapted attention mechanism, GDA, which decouples attention to independently handle the Euclidean shape and relations between map instances. This separation facilitates an effective learning procedure crucial to understanding complex geometries.
  3. BEV Representation and G-Representation: The usage of Bird's-Eye-View (BEV) as an initial feature extractor from multi-view images lays the groundwork for the model. Complementing this, a robust translation- and rotation-invariant representation termed G-Representation allows for effective leveraging of instance geometry.

Implementation and Results

The GeMap technique has been rigorously tested on datasets like NuScenes and Argoverse 2, where it has demonstrated new state-of-the-art performance. Specifically, GeMap achieved a 71.8% mAP on the Argoverse 2 dataset, surpassing existing methods such as MapTR V2 by 4.4%. This improvement is notably significant as it crosses the 70% mAP threshold for the first time. The approach is not particularly taxing on computational resources, maintaining competitive FPS during inference.

Implications and Future Directions

The findings underscore the potential geometric properties have in enhancing the robustness and accuracy of HD map construction. GeMap's clear delineation of shapes and relations provides a structured pathway for learning model networks to interpret complex mapping environments, thereby facilitating a more reliable autonomous driving system.

The theoretical implications of this research rest on understanding the depth of integration of geometric properties into neural network-based systems. Practically, approaches like GeMap may become integral components in real-time applications where accuracy and adaptability to coordinate transformations are paramount.

Future developments could explore further sophistication in geometric representations or more intricate geometric patterns, potentially extending the benefits observed here to broader aspects of navigation and spatial awareness tasks in AI-driven systems. Moreover, exploring the operational scope of geometric features in tackling partial occlusions or adverse conditions in autonomous settings is another promising avenue for subsequent research.

Overall, GeMap provides a robust framework addressing key challenges in online vectorized HD map construction, enhancing both the theoretical and practical paradigms within this evolving field of autonomous technology. The work establishes a firm basis for subsequent investigations aimed at refining and extending the described methodologies.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com