One Thousand and One Hours Dataset
- One Thousand and One Hours Dataset is a large-scale benchmark featuring over 1,118 hours of annotated driving data with precise spatiotemporal details.
- It integrates multimodal sensor outputs from LiDAR, cameras, and radar with detailed HD maps to support motion forecasting, planning, and simulation tasks.
- The dataset enables rigorous evaluation of autonomous driving tasks, including agent forecasting and closed-loop planning, using standardized metrics like FDE and ADE.
The One Thousand and One Hours Dataset is a large-scale, richly annotated benchmark designed for research on data-driven motion prediction, planning, and simulation in autonomous driving. Compiled from an extended drive campaign using a fleet of autonomous vehicles, it constitutes one of the most comprehensive resources for studying the perception, reasoning, and planning systems underlying modern self-driving technologies. The dataset’s construction emphasizes precise spatial-temporal annotations, multimodal sensor fusion outputs, and high-fidelity mapping, targeting development and evaluation of advanced machine learning approaches for agent forecasting, closed-loop trajectory planning, and environment simulation.
1. Dataset Composition and Structure
The dataset was collected over a four-month campaign (October 2019–March 2020) using 20 Level-5 autonomous vehicles traveling a fixed 6.8-mile (≈11 km) suburban loop in Palo Alto, California. The complete log comprises:
- 1,118 hours of driving records, corresponding to 26,344 km of traversed roadways.
- 170,000 scenes, each 25 seconds in duration, sampled at 10 Hz (yielding 4,250,000 seconds of captured perception outputs).
- Data partitioned 83/7/10% by vehicle into training (134,000 scenes, 928 h, 21,849 km), validation (11,000 scenes, 78 h, 1,840 km), and test (16,000 scenes, 112 h, 2,656 km) splits. Vehicles are strictly segregated by split to ensure disjoint data.
Each scene encapsulates the fusion output of the vehicle’s perception stack, capturing all dynamic agents and context for the full 25-second segment.
2. Sensor Modalities and Perception Outputs
Each vehicle in the recording fleet was equipped with:
- LiDAR: One 64-beam roof-mounted scanner (10 Hz) and two 40-beam bumper sensors.
- Cameras: Seven synchronized roof-mounted units providing 360° horizontal FOV.
- Radar: Five units (four on the roof, one in the front bumper).
All sensors are hardware-synchronized within 1 ms. Sensor fusion via the in-house perception stack yields a 360° bird’s-eye (BEV) world-coordinate vehicle frame. At each 10 Hz frame, the output includes:
- Dynamic agent tracking: Mean ≈79 traffic participants (vehicles, pedestrians, cyclists) per timestep.
- 2.5D bounding cuboids (width, length, height).
- Pose and heading/yaw rate.
- Velocity and acceleration.
- Class probability vector (car, pedestrian, cyclist).
- Traffic-light logic states (e.g., red/yellow/green) associated with contextually relevant lane segments.
- All kinematic information expressed in a shared world-frame defined by the local high-definition map.
3. High-Definition Map and Annotation Protocol
The accompanying high-definition (HD) semantic map encodes complete drivable context for the traversed loop. It is distributed as a protocol buffer and includes:
- Total annotated elements: 15,242, manually labeled.
- Lane segments: 8,505, represented as polylines.
- Explicit lane connectivity graph.
- Driving directionality flags (one-way versus two-way), road class, surface markings, crosswalks, and speed limits.
- Lane-use restrictions (bus-only, turn-only, bike-only), traffic signs, and lights (each with associated positional and logical data).
- Miscellaneous infrastructure: speed bumps, keep-clear/no-parking zones.
- Ultra-high-resolution aerial orthomap: Covers 74 km² at 6 cm/pixel, released as 181 GeoTIFF tiles of 10,560 × 10,560 px.
Annotation workflow involved skilled human annotation atop a SLAM-based mapping pipeline, incorporating systematic geometric verification, automated cross-validation, and on-route integrity checks (e.g., enforced presence of required traffic signals).
4. Data Collection Methodology
- Platform: Lyft Level 5 autonomous vehicles with full sensor suite, calibrated for temporal and spatial consistency.
- Localization: SLAM-derived, achieving centimeter-grade accuracy for vehicle pose; this reference underpins all map/timestamp alignment for both agents and infrastructure.
- Route Design: Repetitive traversal during diverse daylight conditions, capturing variable traffic scenarios, lighting, and weather.
- Sliding-window scene extraction: Each 25 s, 10 Hz window forms a complete scene for modeling.
- Map annotation: Human annotators worked iteratively on the vector map over the SLAM base, validating lane topology, traffic device placement, and completeness with both automated and manual QC procedures.
5. Core Tasks and Benchmarking Metrics
The dataset enables rigorous evaluation and comparison for three fundamental autonomous driving tasks:
A. Motion Forecasting
- Task: Given past seconds of an agent’s trajectory and a BEV raster of the semantic/aerial map, predict future positions for a horizon s.
- Metrics:
- Final Displacement Error (FDE):
- Average Displacement Error (ADE):
- Baseline Results: With a ResNet-50 backbone, adding 1 s of history reduces ADE from ≈1.64 m (no past) to ≈0.77 m (full data) (Houston et al., 2020).
B. Motion Planning (Closed-Loop Imitation)
- Task: Given BEV rasters centered on the SDV, predict ego trajectories for 5 s and execute within a simulation where non-SDV agents replay observed behavior.
- Eval: Count SDV-initiated collisions, traffic-rule violations, and off-road departures per driven distance.
- Protocol: "ChauffeurNet"-style perturbation augmentation. Error rates halve as the number of training scenes increases from 1% to 100%.
C. Simulation/“What-If” Experimentation
- Enables alternative SDV policy injection for empirical measurement of effects on collision and violation rates in logged-replay simulation.
6. Access, Tooling, and Licensing
Distribution is via https://level5.lyft.com/dataset/ under a non-commercial academic-use license; separate licensing is required for redistribution or commercial applications. The accompanying L5Kit software is released under Apache 2.0 and features:
- Multithreaded zarr-based random-access scene sampler.
- Centering API for SDV or arbitrary agents (for planning vs. forecasting).
- Configurable BEV rasterisation (semantic map, aerial, hybrid overlays).
- Visualization and export tools for trajectory analyses.
- Reference PyTorch code for motion forecasting and planning pipeline evaluation.
7. Applications, Limitations, and Future Directions
The scale, spatiotemporal precision, and annotation depth of the One Thousand and One Hours Dataset support a breadth of advanced research outcomes:
- Agent motion forecasting: Training and benchmarking of sequence models predicting pedestrian, cyclist, and vehicle trajectories under dense urban conditions.
- Closed-loop imitation learning: Development and rigorous validation of SDV planning architectures and policies in the presence of real-world agent heterogeneity.
- Simulation research: Aggregate effect analysis for alternative autonomy strategies in a log-replay framework.
- Infrastructure-aware modeling: Integration of HD semantic maps and aerial orthomaps substantially improves reasoning under complex urban geometries.
This suggests significant improvements in prediction and planning performance with increased training set size and high-fidelity spatial context. However, the dataset is geographically limited to a single urban-loop corridor, and the agent behavioral diversity remains bounded by operational design domain. Proposed extensions include expansion to multi-city coverage, reactive simulation agents, and additional multimodal context (e.g., weather, roadworks).
Reference: One Thousand and One Hours: Self-driving Motion Prediction Dataset (Houston et al., 2020).