SDTagNet: Leveraging Text-Annotated Navigation Maps for Online HD Map Construction (2506.08997v1)

Published 10 Jun 2025 in cs.CV and cs.RO

Abstract: Autonomous vehicles rely on detailed and accurate environmental information to operate safely. High definition (HD) maps offer a promising solution, but their high maintenance cost poses a significant barrier to scalable deployment. This challenge is addressed by online HD map construction methods, which generate local HD maps from live sensor data. However, these methods are inherently limited by the short perception range of onboard sensors. To overcome this limitation and improve general performance, recent approaches have explored the use of standard definition (SD) maps as prior, which are significantly easier to maintain. We propose SDTagNet, the first online HD map construction method that fully utilizes the information of widely available SD maps, like OpenStreetMap, to enhance far range detection accuracy. Our approach introduces two key innovations. First, in contrast to previous work, we incorporate not only polyline SD map data with manually selected classes, but additional semantic information in the form of textual annotations. In this way, we enrich SD vector map tokens with NLP-derived features, eliminating the dependency on predefined specifications or exhaustive class taxonomies. Second, we introduce a point-level SD map encoder together with orthogonal element identifiers to uniformly integrate all types of map elements. Experiments on Argoverse 2 and nuScenes show that this boosts map perception performance by up to +5.9 mAP (+45%) w.r.t. map construction without priors and up to +3.2 mAP (+20%) w.r.t. previous approaches that already use SD map priors. Code is available at https://github.com/immel-f/SDTagNet

Summary

The paper presents a novel method that uses BERT-based tag embeddings to transform text annotations from SD maps into detailed HD map features.
It introduces point-level semantic encoding and ORF element identifiers to unify diverse map elements and enhance map construction accuracy.
Experimental results on Argoverse 2 and nuScenes demonstrate up to 45% mAP improvement, underscoring scalable, cost-effective autonomous navigation.

SDTagNet: Leveraging Text-Annotated Navigation Maps for Online HD Map Construction

The paper presents SDTagNet, a novel approach to online high-definition (HD) map construction for autonomous vehicles by leveraging standard definition (SD) maps such as OpenStreetMap (OSM). Autonomous vehicles require precise and extensive environmental information to function safely, conventionally provided by HD maps that capture lane-level road geometry, traffic rules, and more. However, their high maintenance cost presents significant scalability challenges. Conversely, SD maps offer broad accessibility and global scale with reduced maintenance effort, albeit typically at lower resolution and detail.

SDTagNet distinguishes itself by capitalizing on the principally untapped potential of SD maps, particularly their textual annotations. These annotations encompass rich semantic information, which prior approaches have largely neglected. Earlier works predominantly focused on polyline data with manually curated attributes, limiting the integration to preselected road elements. SDTagNet surmounts this limitation by incorporating NLP techniques, specifically a BERT-based tag embedding model, to convert text annotations into meaningful semantic features for map construction.

Key Innovations

Point-Level Encoding: Unlike its predecessors, SDTagNet utilizes point-level semantic encoding that enriches the expressiveness and accuracy of map element representation. This transition is critical for aligning SD map data with HD map construction tasks, which benefit from detailed point-level information rather than broad polyline generalizations.
Orthogonal Random Features (ORF) Element Identifiers: SDTagNet employs ORF, inspired by graph transformer methodologies, to unify diverse map elements such as points, polylines, and relations. This integration provides the necessary semantic context to each map element, maintaining continuity and enhancing detection accuracy.
NLP Tag Embedding: The approach uses a BERT model to derive tag embeddings from text annotations, applying contrastive pretraining to focus embeddings on semantically salient tags. This process effectively utilizes all available textual information in SD maps without manual feature engineering, greatly expanding the utility of SD maps in HD map construction.

Experimental Results and Implications

Experiments conducted on the Argoverse 2 and nuScenes datasets reveal substantial improvements in map perception performance. SDTagNet achieves up to +5.9 mAP (+45%) improvement over construction methods without map priors and up to +3.2 mAP (+20%) over previous techniques incorporating SD map priors. These advancements underscore the efficacy of integrating extensive textual annotations with geometric map data for autonomous vehicle navigation.

The implications of this research are profound. By harnessing SD maps effectively, autonomous systems can not only reduce dependency on expensive and labor-intensive HD maps but also enhance the geographical range and detection capabilities. This shift could enable more scalable implementation of autonomous driving technologies worldwide, particularly in rapidly changing or less-documented urban landscapes.

Additionally, SDTagNet establishes a scalable framework that accommodates self-supervised pretraining on vast global map datasets. This methodological flexibility opens pathways for further innovations in map-based navigation and planning systems within AI-driven applications.

Future Directions

The paper encourages exploration of enhanced NLP techniques to further refine tag embeddings and accommodate even broader map element types. Moreover, integrating SDTagNet with dynamic map update mechanisms could address map change detection challenges, enhancing real-time accuracy in the rapidly evolving urban environments. Continued evaluation across diverse datasets with variable map congruence remains crucial to validate and extend these findings comprehensively.

In conclusion, SDTagNet represents a significant stride toward leveraging SD maps for robust, scalable HD map construction, paving the way for efficient autonomous navigation solutions.

PDF Markdown

GitHub

GitHub - immel-f/SDTagNet: Official implementation of SDTagNet: Leveraging Text-Annotated Navigation Maps for Online HD Map Construction

YouTube

Show All Videos