StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction (2308.12570v2)

Published 24 Aug 2023 in cs.CV

Abstract: High-Definition (HD) maps are essential for the safety of autonomous driving systems. While existing techniques employ camera images and onboard sensors to generate vectorized high-precision maps, they are constrained by their reliance on single-frame input. This approach limits their stability and performance in complex scenarios such as occlusions, largely due to the absence of temporal information. Moreover, their performance diminishes when applied to broader perception ranges. In this paper, we present StreamMapNet, a novel online mapping pipeline adept at long-sequence temporal modeling of videos. StreamMapNet employs multi-point attention and temporal information which empowers the construction of large-range local HD maps with high stability and further addresses the limitations of existing methods. Furthermore, we critically examine widely used online HD Map construction benchmark and datasets, Argoverse2 and nuScenes, revealing significant bias in the existing evaluation protocols. We propose to resplit the benchmarks according to geographical spans, promoting fair and precise evaluations. Experimental results validate that StreamMapNet significantly outperforms existing methods across all settings while maintaining an online inference speed of $14.2$ FPS. Our code is available at https://github.com/yuantianyuan01/StreamMapNet.

Citations (53)

View on Semantic Scholar

Summary

The paper introduces StreamMapNet, which leverages long-sequence temporal modeling to produce robust vectorized HD maps.
It employs a multi-point attention mechanism within an encoder-decoder architecture to overcome occlusions and extend perception ranges.
The study refines evaluation protocols with re-split benchmarks and demonstrates 14.2 FPS online inference without sacrificing precision.

Insights into StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction

The paper introduces StreamMapNet, a novel online mapping pipeline focused on producing vectorized HD maps integral to autonomous driving systems. The challenge that this research addresses arises from the limitations in existing methods, which fail to adequately employ temporal information during map construction. These deficiencies result in decreased stability and performance in complex driving scenarios characterized by occlusions and broader perception ranges.

Key Contributions and Methodology

StreamMapNet is noteworthy for its utilization of long-sequence temporal modeling in video data, which departs from traditional techniques reliant on single-frame inputs. The authors propose a multi-point attention mechanism to adeptly encode temporal information, thus constructing local HD maps with enhanced stability over larger ranges. Additionally, they present a critical appraisal of the evaluation protocols used in prevalent online HD construction benchmarks—namely Argoverse2 and nuScenes—highlighting notable biases that lead to fairness issues in model evaluation. They propose re-splitting these benchmarks by geographical spans to encourage fair competition and precise comparisons.

The model architecture consists of an encoder-decoder setup, where a BEV encoder extracts features from multi-view images. This is paired with a DETR-like decoder which, through the introduction of "Multi-Point Attention," models the long-range dependencies crucial for recognizing irregular and elongated features within the map, characteristics conventional detection methods struggle to grasp.

The approach offers computational efficiency while supporting extended perception ranges from $60\times30\,m$ to $100\times50\,m$ , outperforming previous methods in stability and precision—as evidenced by consistently superior experimental results, which show that StreamMapNet achieves an online inference speed of 14.2 FPS without a trade-off in performance.

Implications and Future Directions

StreamMapNet's robust framework presents significant implications for the field of autonomous driving and online map construction by addressing temporal inconsistencies and biases in evaluation benchmarks. The paper's focus on both extending the perception range without sacrificing performance and proposing fairer benchmarks highlights a rigorous and multi-faceted approach to methodological improvement.

From a practical point of view, the proposed model paves the way for better navigation decisions in self-driving vehicles, reducing reliance on resource-intensive and laborious traditional mapping methods. Theoretically, StreamMapNet's streaming strategy for temporal fusion underscores the potential for broader applications beyond vectorized map construction, perhaps into areas where real-time temporal consistency is crucial.

Future developments in this research could explore the integration of StreamMapNet with other sensory data such as LIDAR or radar to further enhance robustness, especially in adverse weather or low-visibility conditions. Exploring proprietary datasets could also scrutinize the model's adaptability to diverse environments, potentially pushing it toward industrial application readiness.

Moreover, as AI safety and ethical considerations continue to grow in focus, the discussion surrounding how sensitive map data is collected, stored, and processed becomes increasingly pertinent. The alignment of AI technologies with the evolving legal landscape surrounding data privacy remains a critical area of exploration, possibly influencing future guidelines on technology deployment in public spaces.

In conclusion, StreamMapNet signifies a substantial stride forward in the domain of HD map construction for autonomous systems, combining temporal modeling with spatial efficiency to yield a robust and scalable mapping solution.

PDF Markdown

Related Papers

GitHub

GitHub - yuantianyuan01/StreamMapNet (186 stars)