MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction (2404.00876v1)

Published 1 Apr 2024 in cs.CV

Abstract: Currently, high-definition (HD) map construction leans towards a lightweight online generation tendency, which aims to preserve timely and reliable road scene information. However, map elements contain strong shape priors. Subtle and sparse annotations make current detection-based frameworks ambiguous in locating relevant feature scopes and cause the loss of detailed structures in prediction. To alleviate these problems, we propose MGMap, a mask-guided approach that effectively highlights the informative regions and achieves precise map element localization by introducing the learned masks. Specifically, MGMap employs learned masks based on the enhanced multi-scale BEV features from two perspectives. At the instance level, we propose the Mask-activated instance (MAI) decoder, which incorporates global instance and structural information into instance queries by the activation of instance masks. At the point level, a novel position-guided mask patch refinement (PG-MPR) module is designed to refine point locations from a finer-grained perspective, enabling the extraction of point-specific patch information. Compared to the baselines, our proposed MGMap achieves a notable improvement of around 10 mAP for different input modalities. Extensive experiments also demonstrate that our approach showcases strong robustness and generalization capabilities. Our code can be found at https://github.com/xiaolul2/MGMap.

References (51)

Citations (10)

View on Semantic Scholar

Summary

The paper presents a novel mask-guided framework that refines HD map construction by enhancing mask-activated decoding and point-level localization.
The approach integrates multi-scale BEV feature extraction with attention mechanisms to overcome challenges from sparse annotations, achieving approximately a 10 mAP improvement.
MGMap offers a scalable, real-time mapping solution crucial for autonomous driving, accurately delineating lanes, road boundaries, and pedestrian crossings.

An Analysis of MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction

The paper entitled "MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction" presents a novel approach to develop a high-precision, real-time vectorized high-definition (HD) map construction framework essential for autonomous driving applications. This research focuses on addressing inherent challenges in current map constructions, primarily those caused by sparse and ambiguous annotations that lead to suboptimal feature localization and the loss of detailed structures.

Methodological Framework and Innovations

The proposed MGMap approach introduces a mask-guided learning framework, employing learned masks for enhanced localization of map elements. The framework operates in three major phases: BEV Feature Extraction, Mask-Activated Instance (MAI) Decoder, and Position-Guided Mask Patch Refinement (PG-MPR) module.

BEV Feature Extraction: The methodology employs a pyramidal structure named the Enhanced Multi-Level (EML) neck to amalgamate multi-scale Bird’s-Eye-View (BEV) features, thereby yielding a comprehensive representation of the driving environment. The EML neck, facilitated by channel and spatial attention mechanisms, captures diverse semantic and location attributes, requisite for precise localization of complex map structures.
Mask-Activated Instance Decoder: The technique introduces mask-activated queries initialized from dynamically generated instance masks. These masks enhance the process of harnessing instance-specific information, allowing the decoder to augment query embeddings with both shape priors and global instance attributes, promoting a refined understanding of map elements.
Position-Guided Mask Patch Refinement: The PG-MPR module addresses localized point-level refinement by drawing on dense patch features extracted from binary masks. This step refines point positions through interaction with local mask patches, thus boosting the fidelity of structure delineation and precise localization.

Numerical Results and Performance Insights

Experimental results on the nuScenes and Argoverse2 datasets underline MGMap's robustness and state-of-the-art performance across varying input modalities, including camera, LiDAR, and their fusion. MGMap achieves an approximate improvement of 10 mean Average Precision (mAP) over baseline models like MapTR in terms of both Chamfer-distance based and raster-based metrics. Notably, MGMap demonstrates substantial generalization capability by maintaining high performance across diverse weather and lighting scenarios.

Implications and Future Directions

The implications of this research are manifold. Practically, MGMap offers a scalable solution for generating detailed, real-time vectorized HD maps essential for self-driving vehicles, capable of providing accurate lane, road boundary, and pedestrian crossing information. Theoretically, the mask-guided framework introduces a paradigm shift from coarse instance-level to fine-grained point-level representations, leveraging attention-based mechanisms to refine these representations in an end-to-end manner.

In future research, MGMap can be integrated with multi-modal and temporal data sources, facilitating even richer data representations, which can be pivotal in maneuvering complex driving scenarios. Exploring unsupervised or semi-supervised learning strategies could also enable the framework to adapt to unseen environments or novel geographical regions, further extending its practical applicability.

Conclusion

This paper delivers significant contributions to online HD map construction by harnessing mask-guided learning to address challenges posed by sparse annotations. MGMap sets a new benchmark in map vectorization, reflecting progress in the domain of autonomous driving, and offering avenues for widespread application in dynamic mapping and real-time navigation systems.

PDF Markdown

Related Papers

GitHub

GitHub - xiaolul2/MGMap: [CVPR2024] The code for "MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction" (112 stars)

YouTube

Show All Videos