Papers
Topics
Authors
Recent
2000 character limit reached

D$^2$-City: A Large-Scale Dashcam Video Dataset of Diverse Traffic Scenarios

Published 3 Apr 2019 in cs.LG, cs.CV, and stat.ML | (1904.01975v2)

Abstract: Driving datasets accelerate the development of intelligent driving and related computer vision technologies, while substantial and detailed annotations serve as fuels and powers to boost the efficacy of such datasets to improve learning-based models. We propose D$2$-City, a large-scale comprehensive collection of dashcam videos collected by vehicles on DiDi's platform. D$2$-City contains more than 10000 video clips which deeply reflect the diversity and complexity of real-world traffic scenarios in China. We also provide bounding boxes and tracking annotations of 12 classes of objects in all frames of 1000 videos and detection annotations on keyframes for the remainder of the videos. Compared with existing datasets, D$2$-City features data in varying weather, road, and traffic conditions and a huge amount of elaborate detection and tracking annotations. By bringing a diverse set of challenging cases to the community, we expect the D$2$-City dataset will advance the perception and related areas of intelligent driving.

Citations (60)

Summary

  • The paper presents a comprehensive dashcam video dataset with over 10,000 clips capturing diverse Chinese traffic conditions.
  • Methodology includes meticulous manual annotations using bounding boxes and tracking labels for 12 object classes in 1,000 videos.
  • Implications emphasize the dataset's value in advancing autonomous driving through improved detection and multi-object tracking techniques.

D2^2-City: A Large-Scale Dashcam Video Dataset of Diverse Traffic Scenarios

The paper presents D2^2-City, a comprehensive dataset of dashcam videos from various traffic scenarios across China, indispensable for advancing intelligent driving technologies. With over 10,000 clips exhibiting a broad spectrum of road and weather conditions, this dataset serves as a significant resource for both detection and tracking tasks, revealing its utility in developing robust AI models for complex driving environments.

Dataset Overview

The D2^2-City dataset stands out due to its extensive coverage of real-world traffic scenarios in China, recorded by dashcams from vehicles operating on DiDi's platform. Unlike previous datasets, which often lack diversity or focus on specific tasks, D2^2-City encapsulates varied environments and conditions, offering more than 11,000 video clips recorded in either 720p HD or 1080p FHD resolution. Figure 1

Figure 1

Figure 1

Figure 1

Figure 1

Figure 1: A list of sample video frames from the D2^2-City data collection.

Data Collection and Specification

Data collection was proficiently executed using dashcams from a selection of vehicles across five Chinese cities, ensuring diverse urban and suburban scenarios. Videos were initially filtered for quality issues such as image blur and lighting conditions, preserving only the most informative clips. The dataset, therefore, boasts high variability in video quality and resolution.

The distribution of videos throughout different times of day reveals a focus on capturing well-lit, varied traffic conditions. Particularly, daytime scenarios have been emphasized due to better video quality under natural light conditions. Figure 2

Figure 2: Distribution of videos in different times of day.

D2^2-City further distinguishes itself by providing detailed annotations for 12 object classes across 1000 videos with comprehensive detection and tracking information. These annotated videos are integral for developing algorithms capable of handling intricate detection and multi-object tracking challenges prevalent in autonomous driving systems.

Annotation and Objectives

Annotation within D2^2-City was accomplished through manually enriched bounding box and tracking labels using an enhanced CVAT system. Notably absent were learning-based pre-annotations, ensuring the purity of ground-truth data. The annotations cover classes such as cars, vans, buses, various tricycle types, and even blocks and forklift objects, highlighting the unique traffic elements in China.

The dataset provides training, validation, and test splits across 1000 annotated videos, along with detection annotations for keyframes in the remaining collection. This design facilitates tasks in object detection, multi-object tracking, and detection interpolation, crucial for robust scene understanding. Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3: Statistics on the numbers of detection bounding boxes of 5 classes (car, van, bus, truck, and person) in each frame.

Comparisons with Existing Datasets

D2^2-City addresses the shortcomings of previous driving datasets, such as KITTI, CityScapes, and BDD100K. While KITTI and CityScapes focus on specific aspects like stereo camera inputs or semantic segmentation, D2^2-City provides voluminous video-level annotations, crucial for tracking applications. Differing significantly from datasets like BDD100K, D2^2-City offers abundant annotated data pertinent to Chinese traffic conditions, which are less represented in datasets focusing on US scenarios.

Implications and Future Work

The implications of D2^2-City are far-reaching within the field of intelligent driving. The richness and diversity it offers provide an unparalleled resource for training perception algorithms capable of adapting to complex, real-world scenarios. Additionally, the dataset is poised to prompt further research into more efficient annotation methods and detection interpolations.

Future updates to D2^2-City will include expanded keyframe annotations, efforts to encapsulate all-weather and extreme conditions, and increase the representation of rare traffic scenarios. This expansion will secure its standing as an essential resource in the continuous development of autonomous driving technologies.

In conclusion, D2^2-City significantly enhances the landscape of driving datasets, laying a foundation for innovations in perception and intelligent driving systems. This remarkable collection of dashcam videos and annotations engenders a breadth of opportunities for AI advancements in vehicular autonomy.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.