Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ROAD: The ROad event Awareness Dataset for Autonomous Driving (2102.11585v3)

Published 23 Feb 2021 in cs.CV, cs.AI, and cs.RO

Abstract: Humans drive in a holistic fashion which entails, in particular, understanding dynamic road events and their evolution. Injecting these capabilities in autonomous vehicles can thus take situational awareness and decision making closer to human-level performance. To this purpose, we introduce the ROad event Awareness Dataset (ROAD) for Autonomous Driving, to our knowledge the first of its kind. ROAD is designed to test an autonomous vehicle's ability to detect road events, defined as triplets composed by an active agent, the action(s) it performs and the corresponding scene locations. ROAD comprises videos originally from the Oxford RobotCar Dataset annotated with bounding boxes showing the location in the image plane of each road event. We benchmark various detection tasks, proposing as a baseline a new incremental algorithm for online road event awareness termed 3D-RetinaNet. We also report the performance on the ROAD tasks of Slowfast and YOLOv5 detectors, as well as that of the winners of the ICCV2021 ROAD challenge, which highlight the challenges faced by situation awareness in autonomous driving. ROAD is designed to allow scholars to investigate exciting tasks such as complex (road) activity detection, future event anticipation and continual learning. The dataset is available at https://github.com/gurkirt/road-dataset; the baseline can be found at https://github.com/gurkirt/3D-RetinaNet.

Citations (78)

Summary

  • The paper introduces ROAD, a novel dataset that models road events as triplets (agent, action, location) to advance autonomous driving scene understanding.
  • It presents a baseline 3D-RetinaNet model using a 3D feature pyramid and focal loss for real-time, multi-label action detection in complex scenarios.
  • Benchmark results show video-mAP for action detection between 20.5% and 33.0% at 0.2 IoU, underscoring the dataset's challenge and real-world applicability.

Insight into the ROad Event Awareness Dataset (ROAD) for Autonomous Driving

The paper presents the ROad event Awareness Dataset (ROAD), a novel dataset tailored for evaluating autonomous vehicles' capacity to recognize and classify dynamic road events. This dataset represents a critical advancement in situational awareness for autonomous driving systems by integrating a holistic understanding of road activities. ROAD's central innovation lies in its structure: it conceptualizes road events as triplets, combining an active agent, the action it performs, and its scene location. By capturing these events, ROAD provides a nuanced perspective beyond traditional object detection and semantic segmentation typically emphasized in autonomous driving research.

Key Contributions and Architecture

Dataset Composition

ROAD is derived from the Oxford RobotCar Dataset and includes 22 videos meticulously annotated with bounding boxes that denote the location of each road event. It introduces multi-label annotation extending across agents, actions, and locations, marking an evolution from simplistic object recognition to a more comprehensive scene understanding. The dataset includes $122K$ video frames, $560K$ bounding boxes, and $1.7M$ individual label instances, embodying its extensive coverage of realistic driving scenarios such as pedestrian movements, vehicle maneuvers, and traffic light states.

Action Detection Framework

The baseline model, 3D-RetinaNet, is a standout technical contribution within this work. It employs an innovative 3D feature pyramid network with focal loss optimized for multi-label detection tasks. This architecture is designed to perform real-time action detection, adapting the incremental online tube construction algorithm developed by Singh et al. for efficiently linking detections over time, crucial for autonomous driving applications. The utilization of single-stage detector technology within this methodology ensures robust handling of a dataset as complex as ROAD.

Benchmarking and Evaluation

The paper reports comprehensive experimental results on ROAD, exploring multiple tasks: agent detection, action detection, location detection, duplex (agent-action) detection, road event detection, and temporal segmentation of autonomous vehicle actions. Benchmarking with both frame-level mean average precision (f-mAP) and video-level mean average precision (video-mAP) emphasizes ROAD's challenge over standard datasets like UCF-101-24. The lower performance on ROAD, with video-mAP for action detection ranging between 20.5\% and 33.0\% at 0.2 IoU thresholds, speaks to the intricacy and realism of road scenarios it presents, far surpassing traditional action recognition tasks.

Theoretical and Practical Implications

ROAD lays the foundation for several areas of exploration within autonomous vehicle research:

  1. Complex Activity Detection: By modeling road behaviors as events comprised of multiple attributes, ROAD invites algorithms capable of recognizing complex driving scenarios involving multiple simultaneous actions.
  2. Event Anticipation and Intent Prediction: The dataset's structure is conducive to forecasting future events, a pivotal capability for enhancing autonomous vehicles’ decision-making strategies.
  3. Continual Learning: ROAD encourages research into adaptive models that evolve as vehicles encounter diverse road conditions and behaviors, facilitating true lifelong learning in autonomous systems.
  4. Enhanced Decision Making: By providing semantic road event descriptions, ROAD acts as a bridge towards human-like decision-making processes within autonomous driving AI, crucial for realistic deployment in varied environments.

Conclusion and Future Prospects

ROAD signifies a substantial step forward in evaluating and enhancing situational awareness for autonomous driving, refining benchmarks beyond static object and pedestrian detection. Its integration of complex, real-world scenarios presents an opportunity for significant advancements in autonomous vehicle technology. By establishing a new standard for scene understanding, ROAD not only offers crucial insights for current autonomous systems but also sets the stage for future developments in predictive modeling and decision-making algorithms. As the dataset expands and methods evolve, ROAD will remain a pivotal resource in the pursuit of safer and more intelligent autonomous vehicles.