Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems (1810.05642v1)

Published 11 Oct 2018 in cs.CV, cs.AI, cs.IR, cs.LG, and stat.ML

Abstract: Scenario-based testing for the safety validation of highly automated vehicles is a promising approach that is being examined in research and industry. This approach heavily relies on data from real-world scenarios to derive the necessary scenario information for testing. Measurement data should be collected at a reasonable effort, contain naturalistic behavior of road users and include all data relevant for a description of the identified scenarios in sufficient quality. However, the current measurement methods fail to meet at least one of the requirements. Thus, we propose a novel method to measure data from an aerial perspective for scenario-based validation fulfilling the mentioned requirements. Furthermore, we provide a large-scale naturalistic vehicle trajectory dataset from German highways called highD. We evaluate the data in terms of quantity, variety and contained scenarios. Our dataset consists of 16.5 hours of measurements from six locations with 110 000 vehicles, a total driven distance of 45 000 km and 5600 recorded complete lane changes. The highD dataset is available online at: http://www.highD-dataset.com

Citations (866)

Summary

  • The paper introduces a novel drone-based approach for capturing naturalistic vehicle trajectories, overcoming traditional sensor limitations.
  • It employs high-resolution video with an adapted U-Net and RTS smoothing, achieving mean positional errors below 3 cm.
  • The dataset offers 16.5 hours of recordings covering 110,000 vehicles, significantly enhancing safety validation and traffic simulation.

An Analysis of the highD Dataset: A Drone Dataset for Naturalistic Vehicle Trajectories on German Highways

The highD dataset, introduced by Robert Krajewski et al., represents a novel approach to validating highly automated driving systems through scenario-based testing. This method leverages drone-captured video data to provide comprehensive naturalistic traffic data from German highways, thereby addressing several limitations of traditional data collection methods.

Motivation and Methodology

Scenario-based safety validation for automated vehicles requires extensive and high-quality real-world data. Conventional methods such as onboard sensors, driving tests, and infrastructure sensors face challenges including limited data quality, sensor occlusion, and impacted naturalistic behavior due to the visibility of sensors. The highD dataset proposes the use of drones equipped with high-resolution cameras to capture vehicle trajectories from an aerial perspective. This method mitigates occlusion issues and maintains the naturalistic behavior of road users by operating unobtrusively above traffic.

The highD dataset comprises 16.5 hours of video recordings from six different locations, covering a distance of approximately 420 meters at each site. It includes data on 110,000 vehicles, encapsulating a variety of traffic scenarios. The researchers employed advanced computer vision techniques, including an adapted U-Net architecture for vehicle detection and RTS smoothing for trajectory refinements, to ensure the data's accuracy and completeness.

Dataset Composition and Quality

The highD dataset offers a significant quantitative and qualitative improvement over existing datasets such as NGSIM. It includes:

  • Recording Duration: 16.5 hours compared to NGSIM’s 1.5 hours.
  • Vehicles: 110,000 vehicles (90,000 cars and 20,000 trucks) against NGSIM’s 9,206 vehicles.
  • Driven Distance: 45,000 kilometers in highD versus 5,071 kilometers in NGSIM.

The high-resolution drone footage enabled precise vehicle detection with mean positional errors of less than 3 cm. This accuracy is retained through rigorous post-processing, eliminating false positives and refining trajectories. The highD dataset records a broader range of speeds and vehicle distributions, which captures more comprehensive and varied real-world driving conditions than NGSIM.

Implications for Automated Driving Validation

The highD dataset's extensive and detailed trajectories offer valuable insights for both practical and theoretical advancements in automated driving:

  • Safety Validation and Impact Assessment: The highD dataset provides a rich source of naturalistic driving data critical for scenario-based safety validation. This is essential for ensuring that highly automated driving systems can handle real-world complexities.
  • Traffic Simulation and Analysis: Researchers and practitioners can leverage the dataset for developing and validating traffic simulation models, driver behavior models, and road user interaction predictions.
  • Maneuver Analysis: The dataset includes maneuvers critical for HAD validation, such as lane changes and vehicle following dynamics. The detailed annotation enables robust analyses of these maneuvers under various conditions.

Future Directions

The success of the highD dataset sets a precedent for future developments in AI-driven traffic analysis and automated vehicle validation. Potential future directions include:

  • Expansion of the Dataset: Continual enlargement of the dataset with additional locations, different times of the day, and varying weather conditions will improve its robustness and applicability.
  • Enhanced Detection and Annotation Methods: Integrating more advanced machine learning algorithms for detection and prediction can further enhance the accuracy and utility of the dataset.
  • Integration with Other Data Sources: Combining drone data with other data sources, such as satellite imagery and ground-based sensors, could provide multi-faceted insights into traffic dynamics.

In conclusion, the highD dataset addresses many limitations of traditional traffic data collection methods, offering a comprehensive and accurate source of naturalistic vehicle trajectories. Its applications extend beyond safety validation to broader research in traffic simulation, driver behavior, and machine learning for autonomous systems, contributing significantly to the advancement of highly automated driving technologies.