VMA: Divide-and-Conquer Vectorized Map Annotation System for Large-Scale Driving Scene (2304.09807v2)

Published 19 Apr 2023 in cs.CV

Abstract: High-definition (HD) map serves as the essential infrastructure of autonomous driving. In this work, we build up a systematic vectorized map annotation framework (termed VMA) for efficiently generating HD map of large-scale driving scene. We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geometric patterns as unified point sequence representation, which can be extended to most map elements in the driving scene. VMA is highly efficient and extensible, requiring negligible human effort, and flexible in terms of spatial scale and element type. We quantitatively and qualitatively validate the annotation performance on real-world urban and highway scenes, as well as NYC Planimetric Database. VMA can significantly improve map generation efficiency and require little human effort. On average VMA takes 160min for annotating a scene with a range of hundreds of meters, and reduces 52.3% of the human cost, showing great application value. Code: https://github.com/hustvl/VMA.

Authors (10)

Shaoyu Chen (26 papers)
Yunchi Zhang (2 papers)
Bencheng Liao (20 papers)
Jiafeng Xie (12 papers)
Tianheng Cheng (31 papers)
Wei Sui (16 papers)
Qian Zhang (308 papers)
Chang Huang (46 papers)
Wenyu Liu (146 papers)
Xinggang Wang (163 papers)

Citations (9)

View on Semantic Scholar

Summary

Overview of the Vectorized Map Annotation System for Large-Scale Driving Scenes

The paper "VMA: Divide-and-Conquer Vectorized Map Annotation System for Large-Scale Driving Scene" presents a significant contribution to the field of HD map generation, which is crucial for autonomous driving. The proposed Vectorized Map Annotation (VMA) framework aims to enhance the efficiency of HD map creation by using a divide-and-conquer approach to annotation, enabling spatial extensibility and handling a diverse array of geometric map elements through a unified point sequence representation. This framework is designed to be automatic and flexible, effectively reducing human involvement and adapting to various map element types and spatial ranges.

Key Components and Methodology

VMA focuses on the development of a system that can process extensive driving scenes automatically and efficiently. The framework is comprised of several key processes:

Scene Reconstruction: The framework starts with a robust system for reconstructing static scenes. This involves crowd-sourced data collection across multiple trips, dynamic object filtering, motion distortion compensation, and multi-trip point cloud aggregation. Each of these steps is crucial for creating a dense and semantically rich point cloud map that serves as the foundation for further annotation processes.
Map Element Representation: The VMA introduces a unified point sequence as a standard representation for various map elements, including line (e.g., lane dividers, curbs), discrete (e.g., arrows, speed bumps), and area elements (e.g., crosswalks). This abstraction facilitates the generalization and simplification of different geometric structures into a consistent format.
Divide-and-Conquer Annotation Scheme: The annotation strategy splits a large scene into manageable units based on odometry information, allowing for parallel processing. The MapTR-based Unit Annotator model is developed to automatically output a vectorized map for each unit. This model is trained and continuously improved through a closed-loop learning strategy using human-verified annotations.
Annotation Merging and Sparsification: The vectorized maps from individual units are incrementally merged to form a global vectorized map. Techniques for element merging vary according to the geometric type of the element, ensuring that continuity and completeness are maintained. Additionally, point sparsification is applied to streamline the map for storage and application efficiency.

Numerical Success and Efficiency

VMA showcases its ability to significantly enhance map generation efficiency, reducing human involvement by 52.3%. The system can annotate a scene covering hundreds of meters in an average of 160 minutes. Evaluation on various scenes, including real-world urban and highway scenes as well as the NYC Planimetric Database, highlights strong performance metrics. Notably, VMA shows high precision, recall, and F1-scores under strict displacement thresholds, indicating its effectiveness in modeling diverse map elements.

Implications and Future Directions

The VMA framework offers considerable practical and theoretical implications. Practically, it provides a scalable solution for automated map generation, essential for deploying and maintaining autonomous driving systems. Theoretically, it sets a precedent for unified geometric representation and efficient annotation methodologies in spatial computing fields.

Looking forward, the integration of additional sensor modalities such as camera and radar data could further enhance scene reconstruction fidelity. Moreover, the divide-and-conquer strategy could be adapted for more complex tasks, like lane graph construction, in broader robotic and autonomous navigation applications.

Through its comprehensive approach to HD map generation, VMA stands as a robust system with substantial application potential in the autonomous driving sector. Further developments building upon this work could address remaining challenges associated with annotation quality and scene complexity, paving the way for increasingly autonomous systems.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - hustvl/VMA: A general map auto annotation framework based on MapTR, with high flexibility in terms of spatial scale and element type (168 stars)