Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction (2208.14437v2)

Published 30 Aug 2022 in cs.CV and cs.RO

Abstract: High-definition (HD) map provides abundant and precise environmental information of the driving scene, serving as a fundamental and indispensable component for planning in autonomous driving system. We present MapTR, a structured end-to-end Transformer for efficient online vectorized HD map construction. We propose a unified permutation-equivalent modeling approach, i.e., modeling map element as a point set with a group of equivalent permutations, which accurately describes the shape of map element and stabilizes the learning process. We design a hierarchical query embedding scheme to flexibly encode structured map information and perform hierarchical bipartite matching for map element learning. MapTR achieves the best performance and efficiency with only camera input among existing vectorized map construction approaches on nuScenes dataset. In particular, MapTR-nano runs at real-time inference speed ($25.1$ FPS) on RTX 3090, $8\times$ faster than the existing state-of-the-art camera-based method while achieving $5.0$ higher mAP. Even compared with the existing state-of-the-art multi-modality method, MapTR-nano achieves $0.7$ higher mAP, and MapTR-tiny achieves $13.5$ higher mAP and $3\times$ faster inference speed. Abundant qualitative results show that MapTR maintains stable and robust map construction quality in complex and various driving scenes. MapTR is of great application value in autonomous driving. Code and more demos are available at \url{https://github.com/hustvl/MapTR}.

Citations (169)

Summary

  • The paper introduces MapTR, an end-to-end Transformer model that resolves permutation ambiguity for real-time vectorized HD map construction.
  • It employs a hierarchical query embedding scheme with bipartite matching to precisely align predicted map components with their ground truth.
  • MapTR achieves real-time inference speeds (25.1 FPS) and improves mAP by up to 13.5, demonstrating significant advancements in autonomous driving mapping.

Structured Modeling and Learning for Efficient Online Vectorized HD Map Construction

In the field of autonomous driving, the creation of high-definition (HD) maps is a task of paramount importance. These maps provide detailed environmental information that improves vehicle planning and navigation. The paper introduces MapTR, a novel approach leveraging a structured Transformer framework for the efficient and real-time construction of vectorized HD maps using primarily camera inputs.

Methodological Innovations

MapTR utilizes an end-to-end Transformer model, which stands out by adopting a permutation-equivalent modeling framework. This innovation addresses the inherent ambiguity in map element representation—a challenge that arises because elements like polygons and polylines have multiple valid permutations when represented as sequences of points. By modeling each element as a point set with equivalent permutations, the authors effectively stabilize the learning process, avoiding potential ambiguities that could hinder effective model training.

The hierarchical query embedding scheme is another noteworthy advancement. It enables flexible encoding of instance-level and point-level information, facilitating parallel processing of map elements and enhancing efficiency. Specifically, hierarchical bipartite matching is employed for training, allowing for precise and structured assignment of predicted map components to their ground truth counterparts.

Numerical Outcomes

The performance of MapTR is validated using the nuScenes dataset, widely recognized within the autonomous driving research community. The results are notable; MapTR-nano achieves real-time inference speeds of 25.1 FPS on an RTX 3090 while surpassing state-of-the-art methods by 5.0 mAP using purely camera-based approaches. When juxtaposed with multi-modality methods, MapTR-nano still holds an edge with faster inference speeds and comparable mAP improvements. MapTR-tiny furthers this achievement, reporting a 13.5 mAP gain and three times faster inference speeds than leading alternatives.

Implications and Future Directions

MapTR's efficiency and accuracy underscore its potential application value in real-world autonomous driving systems. As a component that can enhance the creation of robust, real-time HD maps, it potentially reduces reliance on extensive offline processes, thus lowering costs and simplifying maintenance operations.

From a theoretical perspective, the permutation-equivalent framework could have broader applications beyond HD map construction. Dealing with representational ambiguity is a common challenge in computer vision and other domains involving sequential data. Thus, future research could explore the adaptability of this approach across different types of neural architectures and tasks.

Moreover, the demonstrated combination of high accuracy and low latency paves the way for exploring extensions involving different sensor modalities. While this paper highlights achievements with camera inputs, the inclusion of LiDAR or radar data could further enhance map robustness, especially in challenging environmental conditions.

Conclusion

MapTR represents a significant step forward in the efficient online construction of vectorized HD maps for autonomous vehicles. Its innovative modeling approach and hierarchical processing make it a promising addition to the toolkit for building robust environmental perception systems. Future work could expand on these findings, exploring integration with other sensor technologies and broader applications in intelligent transportation systems.