Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Waymo Open Motion Dataset (WOMD)

Updated 3 July 2025
  • WOMD is a large-scale, high-fidelity dataset that captures diverse, multi-agent driving scenarios with detailed 3D annotations for autonomous vehicle research.
  • It features over 570 hours of driving data across six U.S. cities, enriched with high-definition maps and scenario mining to support rigorous algorithm evaluation.
  • The dataset benchmarks predictive models using joint and marginal metrics, fostering advances in motion forecasting, planning, and safe autonomous driving.

The Waymo Open Motion Dataset (WOMD) is a large-scale, high-fidelity dataset designed to advance research in motion forecasting, planning, and behavior modeling for autonomous driving. Built upon high-quality real-world traffic data, WOMD offers extensive multi-agent, multi-class annotated trajectories, high-definition maps, and scenario mining to support the development and rigorous evaluation of prediction and planning algorithms under interactive and complex traffic conditions.

1. Dataset Scale, Structure, and Diversity

WOMD comprises more than 100,000 scenes, each 20 seconds in duration, sampled at 10 Hz, resulting in over 570 hours of unique driving data drawn from 1,750 km of roadways across six U.S. cities. Each scene contains at least one vehicle, with 57% including pedestrians (20% with at least four) and 16% including cyclists. The dataset provides 7.64 million unique tracks and more than 104,000 temporal segments. Each agent is labeled with a 3D bounding box state at each timestep, along with velocity vectors, and paired with corresponding HD 3D maps that detail lane geometry, road edges, traffic controls, and additional context features.

The dataset ensures balanced coverage of urban/suburban environments, various intersection types, and a rich distribution of traffic scenarios. Scenes are selected via predicate mining to maximize interactive events such as merges, unprotected turns, intersections, pedestrian/cyclist crossings, and high-acceleration maneuvers. The training, validation, and test splits (70/15/15%) include "interactive" splits focused specifically on annotating and evaluating pairs of interacting agents, while standard splits support joint prediction for up to eight agents per scene.

2. Data Collection, Annotation, and Labeling

WOMD leverages a state-of-the-art offboard 3D auto-labeling system to generate high-quality 3D bounding boxes for each traffic participant. This multi-stage process—comprising 3D detection, multi-object tracking, and object-centric box refinement—utilizes the entire scene sequence (including future data), producing temporally consistent annotations with reduced perception noise compared to onboard systems. This approach yields precise position, velocity, heading, and object validity flags for every agent at each 100 ms interval.

Rich environmental context is established through synchronized HD 3D map layers, including lane polylines, road edges, crosswalks, stop bars, traffic signal states, and signage. For each interaction-focused split, explicit agent pairings are annotated, enabling ground-truth evaluation of cooperative and competitive behaviors.

3. Benchmarking Metrics and Motion Forecasting Tasks

WOMD standardizes comprehensive motion forecasting evaluation with both marginal (single-agent) and joint (multi-agent, interactive) metrics:

  • minADE (Minimum Average Displacement Error):

minADE=1TAminkats^a,tsa,tk2\text{minADE} = \frac{1}{TA} \min_k \sum_a \sum_t \|\hat{s}_{a,t} - s_{a,t}^{k}\|_2

  • minFDE (Minimum Final Displacement Error):

minFDE=1Aminkas^a,Tsa,Tk2\text{minFDE} = \frac{1}{A} \min_k \sum_a \|\hat{s}_{a,T} - s_{a,T}^{k}\|_2

  • Miss Rate (MR): Fraction of cases where all KK predictions for an agent (or agent pair) exceed a dynamic, scenario-adaptive threshold in final displacement.
  • Overlap Rate (OR): Fraction of predicted agent trajectories whose boxes overlap—either with other agents or the environment—signaling physically implausible predictions.
  • mAP (mean Average Precision): Adapts object detection AP to the distribution of future trajectories, incorporating precision-recall curves over semantic modes (e.g., straight, turn, stationarity).
  • Additional Multi-Agent Metrics: WOMD defines joint prediction evaluation for interacting agent pairs, with K-best pairing selection and scenario-based error analysis.

These metrics collectively measure trajectory accuracy, uncertainty calibration, multi-modal and physical plausibility, and scene consistency, advancing beyond per-agent measures typical in prior datasets.

4. Baselines, Leaderboard Approaches, and Research Outcomes

WOMD enables benchmarking of algorithms ranging from simple kinematic models to advanced neural architectures. Baseline approaches include:

  • Constant Velocity Model: Propagates the last observed velocity; serves as a benchmark, showing high minADE/minFDE due to scenario complexity.
  • LSTM-based Encoders: Use agent histories, optionally augmented with contextual encoders (road graph, traffic signal, higher-order agent interaction), demonstrating significant gains when map and social features are used (e.g., vehicle minADE reduces from 2.63 to 1.34 as components are added).
  • Joint Prediction Models: Baselines concatenate features for explicit agent pairs; direct joint modeling outperforms marginal approaches on interactive splits.

State-of-the-art competition entries on WOMD leverage vectorized transformer encoders, anchor-free dense prediction, ensemble strategies, and iterative refinement (e.g., MTR, DenseTNT, SMART, TrajFlow), achieving leading results in mAP, minADE, and closed-loop realism across standard and interactive splits.

5. Scenario Mining, Risk Filtering, and Behavioral Assessment

WOMD supports sophisticated scenario extraction and risk-mining methodologies to identify valuable and safety-relevant driving events:

  • Predicate-Based Data Mining: SQL-like predicate filtering discovers diverse, interactive events (e.g., lane changes, close passes, approaching stop controls).
  • Scenario Tagging: Layered tagging captures actor activities (longitudinal, lateral), actor-environment interactions (e.g., stop bar compliance), and actor-actor interactions (close proximity, predicted collision via bounding box overlap).
  • Risk-Based Filtering: Probabilistic models compute collision likelihood by integrating over predictive Gaussian uncertainties:

Pcoll,i=fego(x)fother,i(x)dxP_{\text{coll},i} = \int f_{\text{ego}}(x) f_{\text{other},i}(x) dx

Ri(t)=0smaxS(s;t)Pcoll,i(s;t)ΔtdsR_i(t) = \int_{0}^{s_{\text{max}}} S(s; t) \frac{P_{\text{coll},i}(s; t)}{\Delta t} ds

High-risk situations are selected based on first-order (direct) and second-order (propagated) interactions, enriching downstream training and testing datasets for rare, complex events.

Scenario datasets extracted from WOMD, both by logic (e.g., traffic signal interactions, intersection maneuvers) and risk-scoring, empower the community to benchmark prediction, planning, and AV system robustness under diverse operational conditions.

6. Data Extensions, Quality Improvements, and Benchmarking

Recent research on WOMD has produced critical data extensions:

  • WOMD-LiDAR: Adds over 574 hours of high-resolution, synchronized LiDAR range images to support end-to-end forecasting directly from raw sensor input. Standardized compression (delta encoding, quantization) makes large-scale open release tractable and ensures compatibility with prior releases.
  • Traffic Control Device Datasets: Rule-based extraction and wavelet-based denoising methods yield cleaned, maneuver-labeled AV trajectories for >37,000 traffic light and >44,000 stop sign interactions, with anomaly-free acceleration and jerk profiles.
  • Traffic Signal State Rectification: Automated imputation/rectification dramatically reduces missing/unknown signal labels (completeness up to 71.7%), with estimated red-light violation rates dropping from 15.7% to 2.9% following correction—improving model reliability for prediction tasks that depend on regulatory context.
  • Language and Reasoning Datasets: WOMD-Reasoning extends WOMD with 3 million language Q&A pairs capturing explicit agent interactions, intentions, and traffic rule-induced behaviors, demonstrating that encoding these rich scene narratives improves prediction performance (e.g., over 10% reduction in miss rate when language is leveraged).

7. Implications for Autonomous Driving Research

WOMD and its extensions substantially advance the state of motion forecasting and behavior prediction. The dataset's unprecedented trajectory and scene diversity, interactive and scenario-rich splits, robust annotation and calibration protocols, and comprehensive benchmarks facilitate:

  • Training and evaluation of models capable of robustly generalizing across cities, time, agent class, and rare interactive situations.
  • Development of joint and marginal multi-agent prediction models that account for both explicit and implicit scene constraints.
  • Investigation of scenario selection via formal risk models, supporting targeted validation and data-efficient algorithm improvement.
  • Benchmarking and leaderboard competitions that encourage reproducibility, comparative analysis, and standardization of evaluation metrics.
  • Research in end-to-end perception-to-prediction systems, closed-loop simulation, behavioral evaluation, and explainability.

In summary, the Waymo Open Motion Dataset provides a rich, extensible, and rigorously benchmarked foundation for advancing predictive modeling, scenario understanding, and safe planning in autonomous driving, with wide-ranging utility for both academic and industrial research communities.