OpenLane-V2 Benchmark for Autonomous HD Mapping

Updated 27 October 2025

The benchmark presents a unified evaluation framework that advances HD mapping by combining 3D lane detection, traffic element recognition, and topology reasoning.
It features detailed annotation protocols with 2,000 multi-view scene segments and rich connectivity graphs to capture complex urban driving scenarios.
Baseline methods employing transformers, instance segmentation, and temporal propagation demonstrate substantial improvements in autonomous driving scene understanding.

OpenLane-V2 is a unified benchmark advancing perception and topology reasoning for high-definition (HD) mapping in autonomous driving. It extends prior lane detection benchmarks by integrating multi-view road scene imaging, instance-level lane centerline and traffic element annotation, and explicit semantic and topological relationships, becoming the foundation for scene structure understanding in ADAS and AV research.

1. Dataset Architecture and Annotation Protocols

OpenLane-V2 builds upon Argoverse 2 and nuScenes, comprising 2,000 multi-view scene segments collected across diverse urban regions—six U.S. cities (subset_A) and two international locales (Boston, Singapore; subset_B)—with panoramic imaging at 2Hz. Annotation span is broadened to ±50m in x and ±25m in y from the ego vehicle.

Lane annotations are conceptual: each instance is a 3D centerline $\mathbf{v}_l = [p_1, ..., p_n]$ (typically $n \approx 8$ –20), representing the driving trajectory rather than visual lanelines, and merged or split according to connectivity (unique predecessor/successor logic). Traffic elements—including traffic lights, signs, and road markings—are annotated in 2D front-view with bounding boxes and semantic attributes (color, directional intent).

The structured annotation protocol encodes:

Adjacency matrix delineating lane-lane connectivity as a directed graph ( $\mathcal{E}_{ll}$ edge between terminating/initiating centerlines)
Lane–traffic element correspondence as a bipartite graph ( $\mathcal{E}_{lt}$ expresses which signal controls which lane)

Statistically, frames contain $\sim$ 24–26 centerlines and 3–4 traffic elements, with rich coverage of intersections, merges, splits, and control structures. Total annotation size reaches 2.1M lane and 1.9M topology relationships.

2. Subtask Formalization and Evaluation Metrics

OpenLane-V2 decomposes scene reasoning into three primary subtasks, each equipped with rigorous quantitative metrics.

(a) 3D Lane Detection:

Predicts the geometry of centerlines in 3D. Evaluation relies on discrete Fréchet distance $D_{\rm Fr\acute{e}chet}(v_l, \hat{v}_l) = \min\{\|L\| \mid \forall \text{ couplings } L\}$ between matched ground truth and predicted lane curves. A positive match requires $D_{\rm Fr\acute{e}chet} < t$ for thresholds $t \in \mathbb{T}$ , outputting per-threshold average precision; final metric is $\mathrm{DET}_l = \frac{1}{|\mathbb{T}|} \sum_{t \in \mathbb{T}} AP_t.$

(b) Traffic Element Recognition:

Detects and classifies traffic lights, signs, etc. in the 2D front view, reporting semantic attributes. Evaluation uses a distance-based IoU: $D_{\rm IoU}(v_t, \hat{v}_t) = 1 - \frac{|v_t \cap \hat{v}_t|}{|v_t \cup \hat{v}_t|}$ with a matching threshold (e.g., $0.75$), yielding the $\mathrm{DET}_t$ metric.

(c) Topology Reasoning:

Infers edge connections (lane-lane, lane–traffic) among detected entities. Graph-based metrics adapt average precision for link prediction:

$\mathrm{TOP} = \frac{1}{|V|} \sum_{v \in V} \frac{\sum_{\hat{n}' \in \hat{N}'(v)} P(\hat{n}') \mathbbm{1}(\hat{n}' \in N(v))}{|N(v)|}$

with $N(v)$ as the true neighbor set, $P(\cdot)$ the confidence-ranked precision. Scores TOP $_{ll}$ and TOP $_{lt}$ are computed for connectivity and correspondence, respectively.

Global performance is summarized by the OpenLane-V2 Score (OLS):

$\mathrm{OLS} = \frac{1}{4}[\mathrm{DET}_l + \mathrm{DET}_t + f(\mathrm{TOP}_{ll}) + f(\mathrm{TOP}_{lt})]$

with scaling function $f(\cdot)$ to accentuate topology tasks.

3. Scene Structure and Topology Reasoning

Traditional lane benchmarks isolate detection from semantic reasoning. OpenLane-V2 compels integrated scene graph prediction: networks must reconstruct directed connectivity among centerlines (e.g., mapping traffic flows at merges/splits/intersections) and also assign control relationships between lane centerlines and governing traffic elements. This mirrors human driving, which interprets both geometric and regulatory cues.

Explicit annotation of these relationships enables direct training and robust graph evaluation, stimulating algorithmic advances in reasoning over complex HD map topologies. Notably, the benchmark motivated methods—such as TopoNet, TopoMask (Kalfaoglu et al., 2023), Topo2D (Li et al., 5 Jun 2024), and FASTopoWM (Yang et al., 31 Jul 2025)—to move from parametric or keypoint-based representations to instance-level, mask-based, or temporally enriched models that reason over holistic scene graphs.

4. Baseline and State-of-the-Art Methods

Several approaches have been benchmarked on OpenLane-V2, each advancing the field:

PETRv2+YOLOv8+MLP (CVPR2023 1st-place) (Wu et al., 2023):

Multi-stage pipeline combining transformer-based 3D centerline detection (PETRv2), fast traffic element detection (YOLOv8 with advanced augmentation and pseudo-labeling), and MLP-based topology heads. Achieves 55% OLS, outperforming prior methods by a clear margin; decoupling detection and topology enables modular optimization and robust performance.

TopoMask (Kalfaoglu et al., 2023):

Instance-mask formulation for centerline prediction with direction label enrichment. Mask2Former backbone enables direct instance segmentation; flow-ordering by axis monotonicity sorts unordered points into valid centerlines. Achieves 39.2% OLS, ranking 2nd for centerline F1; competitive with TopoNet on Fréchet metrics, surpasses it on Chamfer.

Topo2D (Li et al., 5 Jun 2024):

Transformer-based fusion of high-recall 2D lane queries into 3D detection and topology reasoning. 2D lane priors initialize queries in 3D space and enhance connectivity classification. Achieves 44.5% OLS, exceeding TopoNet by 11.4% in lane-lane topology.

FASTopoWM (Yang et al., 31 Jul 2025):

Fast-slow decoding with latent world models for temporal propagation. Parallel supervision fuses current-frame and historical queries via transformer world models conditioned on action latents. Improves lane segment detection (37.4% mAP) and centerline OLS (46.3%), outperforming Topo2Seq and TopoFormer.

LanePerf (Wu et al., 17 Jul 2025):

Performance estimation framework integrating DeepSets over lane outputs and pretrained image encoders for label-free robustness assessment. Reports MAE of 0.117 and $\rho$ =0.727, enabling deployment safety monitoring under domain shifts.

5. Technical Innovations and LaTeX Metric Definitions

OpenLane-V2 introduces several technical contributions:

Instance-level Centerline Representation:

Annotated as ordered 3D points; topology encoded via adjacency matrices.

Topology Graph Evaluation:

Discrete Fréchet distance [ $D_{\text{Fréchet}}$ ]:

$D_{\text{Fréchet}}(v_{l}, \hat{v}_{l}) = \min\{\,||L|| \mid \text{ for all possible couplings } L\}$

Detection Score Averaging:

$\mathrm{DET}_l = \frac{1}{|\mathbb{T}|} \sum_{t \in \mathbb{T}} AP_t$

Topology Reasoning Score:

$\mathrm{TOP} = \frac{1}{|V|} \sum_{v \in V} \frac{\sum_{\hat{n}' \in \hat{N}'(v)} P(\hat{n}') \mathbbm{1}(\hat{n}' \in N(v))}{|N(v)|}$

Composite Benchmark Score:

$\mathrm{OLS} = \frac{1}{4}[\mathrm{DET}_l + \mathrm{DET}_t + f(\mathrm{TOP}_{ll}) + f(\mathrm{TOP}_{lt})]$

6. Impact, Open Challenges, and Future Trajectories

The OpenLane-V2 Benchmark elevates autonomous driving scene understanding by critically integrating structure, semantics, and topology. Real-world challenges include occlusions, distant lane reconstruction, and ambiguous connectivity. Topology reasoning remains difficult (lower TOP scores across methods). Incorporating more dynamic agent and environmental diversity (e.g., vehicles, pedestrians), expanding sensor modalities (LiDAR, radar), and graph neural networks for reasoning, constitute natural research extensions.

Recent contributions suggest temporal propagation with latent models (FASTopoWM), fusion of 2D/3D features (Topo2D), and multi-modal dual-view perception (DV-3DLane (Luo et al., 23 Jun 2024)) are promising, indicating robust, scalable HD map reasoning can be achieved with multi-domain, temporally enriched, and graph-attentive models.

OpenLane-V2 thus remains central for benchmarking perception and reasoning, setting a rigorous, extensible framework for HD mapping, planning, and reliable AV deployment in rich, real-world scenarios.