Papers
Topics
Authors
Recent
Search
2000 character limit reached

Argoverse Open Dataset

Updated 26 May 2026
  • Argoverse is a suite of open datasets for AV research, offering rich multimodal data for advanced perception and forecasting tasks.
  • The dataset enables trajectory forecasting with detailed annotations and HD maps, fostering research on motion prediction and traffic scene analysis.
  • Argoverse 2 includes expanded sensor data and maps for complex interactions, aiding diverse applications from object detection to conflict resolution.

Argoverse comprises a suite of open datasets, benchmarks, and supporting tools designed for advanced research in autonomous vehicle (AV) perception, multi-object tracking, trajectory forecasting, and traffic scene understanding. Developed by Argo AI and collaborators, Argoverse systematically addresses the unique requirements of machine learning in urban driving, including dense 3D annotation, urban-scale HD mapping, multimodal sensor coverage, and large-scale motion forecasting. The platform exists in two major generations—Argoverse 1 and the substantially expanded Argoverse 2—each providing distinct datasets and research utilities for the AV community (Chang et al., 2019, Wilson et al., 2023).

1. Dataset Structure and Composition

Argoverse datasets are stratified into multiple components tailored for the primary AV research tasks: perception (object detection and tracking), motion forecasting, and self-supervised lidar sequence modeling.

Argoverse 1

  • 3D Tracking Dataset: 113 driving logs (15–30 s, ~934,000 labeled 3D bounding boxes) from Pittsburgh and Miami, with 7-camera rings (360°, 30 Hz), dual 32-beam LiDAR (10 Hz), and city-aligned 6-DOF egopose.
  • Motion Forecasting Dataset: 324,557 five-second urban driving scenarios mined from ~1,000 hours, with 2 s history and 3 s forecast for each focal vehicle.
  • HD Map: 290 km of detailed urban lane topology with geometric and semantic vectors (turn direction, traffic control, driveable areas) (Chang et al., 2019).

Argoverse 2

  • Sensor Dataset: 1,000 annotated multimodal sequences (15 s, 10 Hz lidar, 20 Hz camera, ~75 objects/frame, 26 evaluation classes).
  • Lidar Dataset: 20,000 long (30 s) lidar-only scenarios, cumulative 6 million sweeps, for self-supervised and point-cloud forecasting.
  • Motion Forecasting Dataset: 250,000 11 s scenarios (5 s observed, 6 s to forecast), mined for complex, “interesting” interactions. All include detailed HD map layers sampled in six U.S. cities (Wilson et al., 2023, Schofield et al., 2024).

2. Data Modalities, Annotation Schemas, and Map Formats

Sensor Modalities

  • Cameras: Seven panoramic ring cameras (≥2MP, 20–30 Hz), two high-resolution stereo cameras (forward-facing, 20 Hz in AV2).
  • LiDAR: Dual 32-beam (200 m, dense point clouds, ~100,000 points/sweep), contributed with ~10 Hz frequency; motion compensations applied to all sweeps.
  • Pose and Calibration: Map-aligned 6-DOF egopose and rigorous extrinsic/intrinsic calibration for all sensors in each scenario.

3D Annotations

  • Per-frame 3D cuboids parameterized as (x,y,z,l,w,h,θ)(x,\,y,\,z,\,l,\,w,\,h,\,\theta) with canonical alignment to the HD map (vehicle coordinate frame origin at rear axle).
  • Tracking uniquely identifies instances across time, including complex occlusions.
  • AV2 supports 26 evaluation categories, spanning all relevant AV actors: vehicles (car, truck, bus, trailer, van), vulnerable road users (pedestrian, bicyclist, motorcyclist), and a range of misc. and “long-tail” classes (Wilson et al., 2023).

Map Representations

  • HD maps encoded as polygonal and polyline graphs with centimeter precision (lanes, crosswalks, driveable polygons, paint markings, ground height raster at 0.3–1 m resolution).
  • Semantic lane/edge attributes: turn direction, intersection flag, traffic control type, speed limit, successor/predecessor connectivity.
  • Maps localized to 100 m ego-centric regions per scenario, provided via APIs in both vector and raster formats (Chang et al., 2019, Wilson et al., 2023, Schofield et al., 2024).

3. Motion Forecasting and Conflict Resolution

Argoverse datasets were pioneering in mass-multimodal forecasting, providing detailed tracks for all moving agents and supporting research in both single- and multi-agent trajectory prediction.

  • Forecasting Setup: Each scenario provides 5 s of dense history for up to ten agent classes, forecasting the next 6 s at 10 Hz (Wilson et al., 2023, Schofield et al., 2024).
  • Input Format: Vectorized past states for all agents, HD map polylines and polygons, and ego-centric normalization to standardize geometric context.
  • Prediction Objective: Output KK multi-modal trajectory hypotheses per agent (commonly K=1K=1 for single-prediction evaluation), with evaluation using metrics such as minADE, minFDE, and Miss Rate (MR) (Schofield et al., 2024).

Surrogate Safety and Efficiency: Conflict Resolution Data

A derivative dataset, the Conflict Resolution Dataset, augments Argoverse 2 motion forecasting by systematically mining and refining >21,000 intersection “conflict” scenarios (5,337 with AVs, 16,094 AV-free), with controlled data rectification, trajectory smoothing, and regime classification (Li et al., 2023). Key innovations include:

  • Scenario Selection: Automated detection of genuine intersection conflicts via buffer-curve intersection, PET << 5 s, dmin<d_{min} < 8 m, and behavioral change criteria.
  • Trajectory Correction: Speed outlier correction (cubic-jerk model), position re-synthesis, self-consistent velocity enforcement, wavelet denoising for AVs.
  • Data Format: For each conflict, time-stamped and smoothed [t, x, y, v_x, v_y, ϕ\phi, aa, jj], detailed agent and conflict metadata, and map polylines segmented into tail-to-head vectors.
  • Directional Regime Labeling: Comprehensive encoding of interaction type (parallel, crossing, opposing) and spatiotemporal regime (Li et al., 2023).

4. Metrics, Baseline Methods, and Experimental Results

Perception

  • Detection: Evaluated using per-class AP, mean AP, Average Translation Error (ATE), Average Scale Error (ASE), Average Orientation Error (AOE), and Composite Detection Score (CDS).
  • Tracking: Key metrics include MOTA, MOTP, identity switches, MT/ML, with sample values: vehicle MOTA >> 65.5 at \leq 30 m in Argoverse 1 (Chang et al., 2019).

Forecasting

  • Displacement Metrics: ADE, FDE, minADE, minFDE—computed over KK0 frames and KK1 hypotheses.
  • Miss Rate (MR): Fraction of agents exceeding a 2 m final error threshold in all KK2 predictions.
  • Drivable Area Compliance (DAC): Fraction of predicted trajectories remaining within legal driveable area polygons.

Conflict Resolution Metrics

  • Safety Surrogates: Post-Encroachment Time (PET), the elapsed time between exiting/entering the conflict point; Proportion of Stopping Distance (PSD), quantifying risk relative to braking distance.
  • Efficiency: Minimum Recurrent Clearance Time (MRCT), the minimal KK3 safely separating repeated conflict events, approximating per-lane theoretical flow.
  • Empirical Findings: AVs exhibit more conservative strategies (higher PET as second actor, softer braking), with efficiency (MRCTKK4) decreasing by 8.6% in AV-involved conflicts compared to human-only cases; pedestrian behavior is more variable in AV encounters (Li et al., 2023).

Representative Results Table

Task Dataset Component Metric Typical Result
3D Detection Sensor Dataset mAP Top leaderboard KK5 0.46
Forecasting Motion Forecasting minFDE@6s (K=6) KK6 1.36 m (leaderboard)
Conflict Res. Conflict Resolution KK7 MRCT (AV effect) +8.6% (AV-involved vs. HV-HV)
Pointcloud FC Lidar Dataset Chamfer@16k logs 14.0 (SPFNet baseline)

5. Research Usage and Applications

Argoverse datasets, including specialized derivatives such as the Conflict Resolution Dataset, enable a broad array of research directions:

  • Motion Forecasting: Development/evaluation of models ranging from classical LSTM to advanced vectorized world models (e.g., VRD) for single- and multi-agent, multi-modal trajectory prediction (Schofield et al., 2024).
  • Perception: End-to-end 3D object detection, tracking, long-tail recognition, and joint multimodal fusion.
  • Self-Supervised Learning: Large-scale 3D scene flow, point-cloud forecasting, and feature pretraining exploiting the scale and diversity of lidar logs (Wilson et al., 2023).
  • Human-AV Interaction: Quantitative analysis of human driver and pedestrian reactions, policy calibration, and intersection conflict modeling with AVs present (Li et al., 2023).
  • Benchmarking and Leaderboards: All motion forecasting, 3D detection, and tracking tasks are supported by active leaderboards with standardized splits and withheld test annotations.

6. Data Access, Licensing, and Best Practices

All Argoverse datasets (including the Sensor, Lidar, Motion Forecasting, and Conflict Resolution datasets) are distributed under the Creative Commons Attribution–NonCommercial–ShareAlike 4.0 International license (CC BY-NC-SA 4.0), with downloads and APIs available via argoverse.org and affiliated repositories. Privacy measures are implemented (face/license plate blurring), and researchers are encouraged to use the official splits and benchmark servers for result reporting (Chang et al., 2019, Wilson et al., 2023, Li et al., 2023).

Recommendations for best practices include normalization of coordinates to an ego-centric frame, vectorized HD map consumption, precomputing agent state vectors, and exploiting map semantics for improved prediction and planning (Schofield et al., 2024). Preprocessing of map and track entities within localized ROI is particularly advised to stabilize training and evaluation.

7. Impact, Limitations, and Future Directions

Argoverse raises the bar for AV benchmarking through scenario scale, annotation richness, and scenario diversity:

  • Advances Over Prior Datasets: One order of magnitude more lidar and forecasting data, higher object/frame density, intersection-rich complexity, and expanded object taxonomy relative to KITTI, nuScenes, and Waymo Open Dataset (Wilson et al., 2023).
  • Modeling Opportunities: Self-supervised learning, improved long-horizon and socially-aware forecasting, capacity estimation at intersections, and calibration/validation of micro-simulation tools anchored in real-world behavior (Wilson et al., 2023, Li et al., 2023).
  • Challenges: Long-tail category detection, robust generalization across cities, and the integration of real AV control policy evaluation remain open topics.
  • Future Work: Enriched interaction modeling for pedestrians/cyclists, map automation, egocentric scene understanding, and standardization of surrogate safety/efficiency metrics for AV regulatory evaluation.

A plausible implication is that as AV deployments become more prevalent in mixed urban traffic, datasets such as Argoverse and its derivatives will play an increasingly pivotal role in driving both technical progress and regulatory standardization in autonomous driving research (Li et al., 2023, Wilson et al., 2023, Schofield et al., 2024, Chang et al., 2019).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Argoverse Dataset.