Papers
Topics
Authors
Recent
2000 character limit reached

Waymo Open Dataset Overview

Updated 8 December 2025
  • Waymo Open Dataset is a large-scale, multi-modal benchmark offering diverse sensor data and extensive annotations for autonomous driving research.
  • It supports tasks such as 2D/3D detection, tracking, segmentation, and motion forecasting through advanced multi-frame fusion and sensor integration techniques.
  • Real-world deployments, rigorous annotation protocols, and innovative quality metrics drive enhanced performance and reliability in autonomous vehicle applications.

The Waymo Open Dataset (WOD) is a large-scale, multi-modal benchmark developed to advance research in autonomous driving, perception, and behavior modeling. Originating from real-world self-driving vehicle deployments, WOD provides extensively annotated sensor data encompassing urban and suburban environments. As of recent releases, WOD and its extensions serve as the foundation for leading challenges in 2D/3D detection, tracking, semantic segmentation, motion forecasting, end-to-end driving, interaction reasoning, behavior validation, action recognition, and rare “long-tail” scenario evaluation. It is widely adopted in both academic and industrial research tracks, owing to its diversity, scale, and alignment with actual self-driving problems.

1. Dataset Composition, Modalities, and Annotation Protocols

WOD consists of synchronized and calibrated data collected from self-driving vehicles instrumented with a comprehensive sensor suite: five LiDARs (one roof, four perimeter; dual returns; ~177k–300k points/frame post-fusion), five cameras (front, front-left/right, side-left/right; resolutions up to 1920×1280; rolling shutter), and high-precision IMU/GNSS for ego localization (Sun et al., 2019). The canonical “perception split” comprises 1,150 scenes—20 seconds long at 10 Hz—for a total of 230,000 multi-sensor frames. Data was recorded across Phoenix, Mountain View, and San Francisco, covering ~76 km² with strong geographic and scenario diversity.

Annotations include:

  • 3D bounding boxes on LiDAR for vehicles, pedestrians, cyclists, and signage; annotated to 75 m and with consistent track IDs across frames.
  • 2D detection boxes for the same object classes on all cameras.
  • Exhaustive per-point semantic labels for 23+2 classes (e.g., vehicles, traffic signs, roads, sidewalks) in semantic segmentation tracks (Wu et al., 21 Jul 2024).
  • Multi-view and multi-object tracking IDs for both 2D and 3D detection tracks.
  • Specialized labels for motion forecasting: object-centric trajectories, HD lane-graph maps, and agent intention predictions (Li et al., 5 Jul 2024).
  • Rich action/event awareness via ROAD-Waymo, which overlays agent, action, and location labels on 198k front-camera frames, yielding ~13 million multi-label annotations and extensive cross-region/condition coverage (Khan et al., 3 Nov 2024).

Annotations are produced via industrial-strength toolchains, undergo multiple rounds of verification, and, for label consistency, difficulty-level stratification (LEVEL_1, LEVEL_2) borrowed from the KITTI benchmark (Sun et al., 2019).

2. Core Tasks: Detection, Tracking, Segmentation, and Forecasting

WOD supports a wide array of perception and planning tasks:

  • 2D Object Detection: State-of-the-art models utilize both one-stage (YOLOR, CenterNet) and two-stage (Cascade R-CNN, FPN) architectures, often incorporating multi-scale augmentation, ensembling, and scale calibration to address WOD’s preponderance of small-object instances. Leading models achieve up to 74.43 mAP (LEVEL 2, ensemble) (Huang et al., 2020).
  • 3D Object Detection: LiDAR-centric detectors (PV-RCNN, AFDet, PointPillars) dominate the leaderboard, with multi-frame fusion and “point painting” from camera-derived semantic scores providing consistent uplifts in mAPH/mIoU. PV-RCNN features sparse-voxel backbones and PointNet-style set abstraction, while AFDet eliminates anchors and NMS entirely by adopting center heatmap-based keypoint encoding (Shi et al., 2020, Ding et al., 2020).
  • Multi-Object Tracking: Online, tracking-by-detection pipelines (notably HorizonMOT) incorporate tailored Kalman-filter motion models, Re-ID features (2D), and specialized cascade/hierarchical association strategies, attaining 45% 2D MOTA/L2 and 63% 3D MOTA/L2 (Wang et al., 2020).
  • Semantic Segmentation: Recent advances use point cloud transformers (e.g., Point Transformer V3 Extreme, vFusedSeg3D) with multi-frame, no-clipping policies, ensemble inference, and LiDAR-image fusion to surpass 74% mIoU on val/test (Wu et al., 21 Jul 2024, Amjad et al., 9 Aug 2024).
  • Motion Forecasting and Interaction Reasoning: WOD’s motion splits provide densely annotated multi-agent trajectory snippets (20 s, 10 Hz), lane-level HD maps, and vehicle intention ground truth. WOMD-Reasoning extends this with ~3 million Q&A pairs categorizing scene, agent, interaction, and compliance reasoning, markedly enriching language-driven, explainable planning research (Li et al., 5 Jul 2024).

3. Dataset Extensions, Specialized Splits, and Quality Improvements

WOD underpins several derived datasets and ongoing quality-control initiatives:

  • ROAD-Waymo: An action-awareness layer integrating 12.9 million labels for event detection, multi-label action reasoning (e.g., MoveAway, Brake, Crossing), and SAT-based logical verification, supporting both agent-centric and scenario-level perception in U.S. domains (Khan et al., 3 Nov 2024).
  • WOMD-Reasoning: A Q&A corpus affording multi-modal, language-guided trajectory prediction, interaction evaluation, and traffic rule adherence analysis via 2.94 million question–answer pairs, used to train language–motion models such as Motion-LLaVA (Li et al., 5 Jul 2024).
  • Risk-Based Filtering: Annotates high-risk first- and second-order driving situations through probabilistic collision-risk models, outperforming rule-based (TTP) and kinematic (Kalman) baselines for valuable-scenario selection (Puphal et al., 30 Jun 2025).
  • Traffic Signal Imputation and Correction: An automated trajectory-based imputation pipeline remediates 71.7% missing/unknown signal phases, reducing red-light violation rate from 15.7% to 2.9% and enhancing the reliability of WOD for planning and compliance benchmarking (Yan et al., 8 Jun 2025).
  • WOD-E2E: Tailored for vision-based end-to-end learning, WOD-E2E comprises 4,021 curated 20 s video segments, each featuring rare long-tail events (<0.03% occurrence), eight synchronized camera streams, ego states, and rater-annotated reference trajectories. It introduces the Rater Feedback Score (RFS), a human-centric evaluation metric designed to align predicted trajectories with safe, legal, and efficient behaviors under critical circumstances (Xu et al., 30 Oct 2025).

4. Notable Methodological Advances in WOD Benchmarks

WOD research has spurred multiple innovations in data representation, model architecture, fusion, and evaluation:

  • Multi-Frame and No-Clipping Strategies: For semantic segmentation, removing point-cloud clipping and fusing multiple annotated frames deliver clear mIoU gains (up to +2.7% absolute) and address range sparsity (Wu et al., 21 Jul 2024).
  • LiDAR–Camera Feature Fusion: Multi-modal architectures, such as vFusedSeg3D, implement projection/alignment modules and feature fusion blocks (GFFM, SFFM) to merge the geometric precision of LiDAR with the semantic richness of images (Amjad et al., 9 Aug 2024).
  • Action-Aware Event Detection: Cross-domain, multi-label annotation schemas (ROAD-Waymo) with logic-constraint SAT checks enable agent–action–event detection tasks, supporting unsupervised domain adaptation protocols (ROAD++) with standardized f-mAP, video-mAP metrics (Khan et al., 3 Nov 2024).
  • Human-Aligned Metric Development: The RFS metric in WOD-E2E transcends conventional geometric error (ADE), measuring model adherence to rater-defined trust regions and penalizing illegality or unsafe maneuvers with exponential score decay to a minimum threshold, highlighting critical failure modes overlooked by simple L2-based benchmarks (Xu et al., 30 Oct 2025).

5. Limitations, Quality Validation, and Behavioral Fidelity

Despite its breadth, several studies reveal limitations for certain high-fidelity modeling tasks:

  • Behavioral Fidelity: An independent validation using helicopter-based naturalistic data (PHX) found that WOD, specifically the Waymo Open Motion Dataset (WOMD), underrepresents short headways, aggressive decelerations, and lateral maneuvers when compared to real-world Level 4 AV behavior. This discrepancy persists after controlling for measurement error and data segmentation; calibrated models trained solely on WOMD risk underestimating the true variability and risk profile of naturalistic driving (Zhang et al., 3 Sep 2025).
  • Signal Data Completeness: The initial WOMD contained 71.7% missing or unknown signal states for movement–time pairs, necessitating extensive imputation as described above (Yan et al., 8 Jun 2025).
  • Smoothing and Segmentation: The trajectory computation pipeline applies proprietary tracking and smoothing, which can mask instantaneous and discontinuous behavioral phenomena, and 20 s clip segmentation omits extended context needed for longitudinal behavior analysis (Zhang et al., 3 Sep 2025).
  • Object and Scenario Diversity: While WOD’s diversity (15× nuScenes by coverage area) substantially mitigates overfitting, cross-city/domain performance studies confirm persistent AP drops when models are trained/tested on different geographies, necessitating domain adaptation methods (Sun et al., 2019, Khan et al., 3 Nov 2024).

6. Access, Licensing, and Community Ecosystem

WOD and its extensions are available at https://waymo.com/open under research/non-commercial licenses. Datasets include raw sensor files, annotations, HD maps, evaluation protocols, and baseline implementations. Open-source resources linked to specific contributions—such as risk labels (Puphal et al., 30 Jun 2025), imputed traffic signals (Yan et al., 8 Jun 2025), WOMD-Reasoning code (Li et al., 5 Jul 2024), and challenge leaderboards—support reproducibility and benchmarking.

Research groups and challenge organizers frequently update datasets and competition tasks, accelerating development and public dissemination of high-performance, safety-aligned machine learning solutions for autonomous driving.


References:

(Sun et al., 2019, Huang et al., 2020, Shi et al., 2020, Ding et al., 2020, Wu et al., 21 Jul 2024, Amjad et al., 9 Aug 2024, Wang et al., 2020, Khan et al., 3 Nov 2024, Li et al., 5 Jul 2024, Puphal et al., 30 Jun 2025, Yan et al., 8 Jun 2025, Zhang et al., 3 Sep 2025, Xu et al., 30 Oct 2025)

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Waymo Open Dataset (WOD).