Argoverse Motion Forecasting Dataset
- Argoverse is a motion forecasting dataset offering extensive vehicle trajectory, sensor, and high-definition map data for autonomous driving research.
- It integrates detailed road geometry and traffic context, enabling the development of advanced prediction models and rigorous benchmarking.
- Benchmark studies show the dataset improves trajectory prediction accuracy and safety in autonomous navigation systems.
The nuScenes Knowledge Graph (nSKG) is a comprehensive semantic traffic scene representation constructed over the nuScenes autonomous driving dataset. It formalizes map topology, agents, traffic rules, and their semantic-spatial interactions through a typed attributed directed graph for downstream tasks such as trajectory prediction and scene understanding. nSKG leverages an OWL-DL ontology (SROIQ(D)) and stores all data as RDF triples, supporting explicit reasoning and scalable integration into graph neural network (GNN) and foundation model pipelines (Mlodzian et al., 2023, Zhou et al., 24 Mar 2025).
1. Formal Structure and Ontology Specification
nSKG is defined formally as a heterogeneously typed attributed directed graph:
with denoting all ontology entities (nodes), the set of directed edges encoding semantic/spatial relations, the finite set of edge relation types, the node-feature matrix, and the edge-feature matrix.
The ontology comprises 42 classes, 10 object properties, and 24 data-type properties (Zhou et al., 24 Mar 2025). Major schema modules include:
- Agent Module: Vehicle, Human, Micro-mobility, Static Obstacle subclasses with fine-grained types (e.g., Car, Truck, ConstructionWorker, Stroller).
- Map Module: Hierarchical, with RoadSegment, Lane, LaneSnippet, LaneSlice, Intersection, PedCrossing, Walkway, CarParkArea, TrafficLightStopArea, and more.
- Scene Module: Scene entity , capturing state, temporal linkage (prevScene, nextScene), timestamp, and participant links (hasParticipant).
Object properties encode topological, temporal, and physical relationships (e.g., hasParticipant, prevScene/nextScene, onLane, connectedTo, stopAt). Reasoning axioms enable multi-scale and transitive closure (e.g., LaneSlice LaneSnippet Lane) (Mlodzian et al., 2023).
2. Graph Construction Pipeline
Raw nuScenes sensor streams (LiDAR, multi-camera, radar, IMU/GNSS) supply perception and map data. nSKG construction follows an ontology-driven transformation pipeline:
- Perception Stage: 3D object detection, semantic segmentation, instance segmentation, BEV-projection for dynamic/static elements.
- ABox Population: Scene entity instantiation per timestamp (typically at 10 Hz), agents with class labels, location, heading, velocity; spatial joins via Geo-SPARQL (e.g., agent location within LaneSlice polygons) (Zhou et al., 24 Mar 2025).
- Temporal and Spatial Linking: Build temporal scene chains (prevScene/nextScene) and agent trajectories (inNextScene). Map topology linked with Lane, LaneSnippet, LaneSlice, LaneConnector, road blocks.
- LaneSnippet Extraction: Borders split at type changes or length 20 m. Adjacent snippets connected via semantic switchViaX relations (SingleSolid, DoubleDashed).
- LaneSlice Geometry: For center arcline points, derive left/right border proximity and compute width, then instantiate LaneSlice nodes—linked sequentially (hasNextLaneSlice).
- Stop Area and Crossings: Map stop_line to StopArea, traffic_light to TrafficLightStopArea; identify pedestrian crossing by spatial proximity of walkways 5 m apart.
- RoadBlock Grouping: Lane clustering by shared surface/direction, connective linkage.
- RDF Conversion: Dump entire entity-relation set as RDF triples, loaded to triplestores (Blazegraph, Owlready2).
- nSTP Extraction: For trajectory prediction, extract 2 s agent history, spatially reachable map elements, other SceneParticipants, and normalize coordinates (shift- and rotation-invariant; ) (Mlodzian et al., 2023).
3. Semantic and Spatial Relations
Principal relation types in nSKG encapsulate broad scene semantics. Object properties (edges) include temporal, participant, agent-agent, map-topology, geometry anchoring, and physical proximity:
| Group | Relation | Meaning/Edge Feature |
|---|---|---|
| Temporal | hasNextScene/prevScene | Scene chain t→t+1/t→t–1 |
| Participant | isSceneParticipantOf | Agent↔Agent at time |
| Inter-Agent | follows, parallel | Precedence, lateral alongside |
| Map-Topology | hasNextLane, hasLeftLane | Lane graph adjacency |
| Geometry | laneHasSlice, connectorHasPose | Anchor geometry |
| Crossings | causesStopAt | Light→stop area causality |
| Proximity | isOn, walkwayIsNextTo | Agent@time→map element |
| RoadBlocks | hasNextRoadBlock | Forward block adjacency |
In practice, relations are represented as RDF triples and edge features typically as one-hot or learned embeddings, with spatial edges carrying their type as feature. Continuous distances are encoded implicitly via existence thresholds, not as explicit edge features (Mlodzian et al., 2023).
4. PyTorch-Geometric API and Downstream Integration
For trajectory prediction, nSKG data is transformed to PyG HeteroData objects, supporting heterogeneous GNN architectures:
- Node Features: Each node type (SceneParticipant, LaneSlice, LaneConnector, etc.) has an feature tensor.
- Edge Indexing: Per-relation edge index tensors encode source/target node connectivity, facilitating message passing per-edge type.
- Past and Future Trajectories: SceneParticipant nodes carry boolean target masks and past-position features; regression targets encode 6 s future displacement at 2 Hz.
- Loader: Batch loading via torch_geometric.loader.DataLoader.
- Model Architecture: Heterogeneous message passing (e.g., PyG HeteroConv) by relation type; MLP head attached to target node embedding for trajectory regression. Mean-squared error is the canonical loss. Alternatives include aggregated edge-type embeddings, Graph Transformer/HGT layers for richer attention, and multi-modal losses (min-ADE, maneuver classification) (Mlodzian et al., 2023).
Scalability considerations arise due to average subgraph sizes of 1,000–2,000 nodes; subgraph sampling strategies such as ClusterGCN, GraphSAINT are recommended.
5. BEV Symbolic Representation and Foundation Models
nSKG enables formal Bird’s Eye View (BEV) symbolic extraction for foundation model training (Zhou et al., 24 Mar 2025). Around each ego-vehicle, a BEV grid is constructed:
with (20 × 11) cells tiling 2 m × 2 m world-coordinate patches, each cell encoding scene object(s) by ontology label.
BEV grids are serialized into token sequences:
- Metadata prefix: <country>, <dist> (Δd), <orientation_diff> (Δθ), <scene_start>.
- Grid cell serialization: concept tokens/emission (<concept_sep>), <col_sep>, <row_sep>, <empty> for blanks.
- Scene pairs: and concatenated.
For training, token masking strategy is employed (random span mask via sentinel <Máµ¢> tokens) for span prediction and next-scene prediction. The vocabulary unites 28+ ontology concepts and spatial delimiters.
6. Quantitative Characteristics and Experimental Findings
Quantitative statistics for nSKG (Zhou et al., 24 Mar 2025) include:
- RDF triple count: ~43 million (static + dynamic)
- Scene count: ~30,000 (20 s at 10 Hz, ~1,000 scenarios)
- Object occurrence frequencies:
| Concept | Count |
|---|---|
| Walkway | 61,879 |
| Intersection | 51,285 |
| Pedestrian Crossing | 22,448 |
| Car Park Area | 14,575 |
| Traffic Light Stop Area | 13,618 |
| Child | 9 |
| Stroller | 3 |
- Dynamic objects: 9,374; static objects: 371,271
- Lane graph average degree: ~2.3; scene chain length: 200
Key experiments using pre-trained T5 models yielded:
- Scene object prediction (T5-Base): Accuracy 88.7%, Precision 86.6%, Recall 74.4%, F1 78.6%
- Next scene prediction (T5-Base): Accuracy 86.7%, Precision 61.8%, Recall 59.4%, F1 60.3%
- Ablations: Finer grid resolution (2 m) yielded superior accuracy versus coarser (5 m); recall favored owing to safety-critical object detection.
Zero-shot baselines (LLaMA3.1, ChatGPT) performed significantly below fine-tuned T5 (20–41% acc.), and pre-training (masked span fill) accelerated convergence for scene-prediction tasks.
7. Significance, Current Usage, and Future Directions
nSKG provides an explicit, semantically rich graph representation for traffic scene understanding, unifying raw sensor streams with topological, contextual, and temporal relationships. Its main contributions are the open ontological schema, knowledge graph construction, and dataset release of >40,000 heterograph regression examples (Mlodzian et al., 2023).
Empirical validation for trajectory prediction architectures remains open; nSKG subgraphs are designed to maximize shift- and rotation-invariance. Extension to symbolic foundation models for autonomous driving demonstrates strong spatial and temporal reasoning and forms the basis for further research in comprehensive scene understanding (Zhou et al., 24 Mar 2025).
All scripts, ontology files, RDF triples, and PyG HeteroData artifacts are publicly available at the project repositories, facilitating reproducibility and broad downstream integration.