Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future (2312.03408v4)

Published 6 Dec 2023 in cs.CV

Abstract: With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem. Current autonomous driving datasets can broadly be categorized into two generations. The first-generation autonomous driving datasets are characterized by relatively simpler sensor modalities, smaller data scale, and is limited to perception-level tasks. KITTI, introduced in 2012, serves as a prominent representative of this initial wave. In contrast, the second-generation datasets exhibit heightened complexity in sensor modalities, greater data scale and diversity, and an expansion of tasks from perception to encompass prediction and control. Leading examples of the second generation include nuScenes and Waymo, introduced around 2019. This comprehensive review, conducted in collaboration with esteemed colleagues from both academia and industry, systematically assesses over seventy open-source autonomous driving datasets from domestic and international sources. It offers insights into various aspects, such as the principles underlying the creation of high-quality datasets, the pivotal role of data engine systems, and the utilization of generative foundation models to facilitate scalable data generation. Furthermore, this review undertakes an exhaustive analysis and discourse regarding the characteristics and data scales that future third-generation autonomous driving datasets should possess. It also delves into the scientific and technical challenges that warrant resolution. These endeavors are pivotal in advancing autonomous innovation and fostering technological enhancement in critical domains. For further details, please refer to https://github.com/OpenDriveLab/DriveAGI.

PDF HTML Abstract

Open-sourced Data Ecosystem in Autonomous Driving: The Present and Future

The paper under review offers a rigorous examination of the open-sourced data ecosystem essential for the evolution of autonomous driving technology. It meticulously categorizes and assesses over seventy open-source datasets, while addressing both existing challenges and future trajectories in dataset development. With an emphasis on the two generations of datasets, the paper outlines how these datasets support the growing demands of tasks in autonomous driving.

Overview of Autonomous Driving Datasets

The paper categorizes datasets into two main generations. The first generation, epitomized by the KITTI dataset introduced in 2012, focuses on simpler sensor modalities and perception-level tasks. Conversely, the second generation, including nuScenes and Waymo, expands to more complex sensor data and a broader range of tasks beyond perception to cover prediction and control. These datasets exemplify the industry's progression in handling increased data scale and complexity to accommodate sophisticated autonomous driving systems.

Data Engine System and Dataset Annotation

A significant portion of the paper is dedicated to data engine systems, which are crucial to constructing massive, high-quality datasets. The paper highlights prevalent labeling strategies and tools, such as Amazon's Sagemaker Ground Truth and Scale AI's Data Engine, which are compared based on capabilities and cost. Furthermore, it addresses the role of automation in data labeling—contributing to improved efficiency and reduced costs.

Next-generation Datasets

The authors argue for the development of a third-generation dataset to meet the demands of modern autonomous driving technology. They propose criteria, such as comprehensive sensor coverage, high-quality annotations, and support for innovative paradigms like end-to-end frameworks. By integrating advancements from foundation models, these datasets aim to enhance both the scale and quality, aligning with industrial and academic developments.

Implications and Future Directions

The implications of this research are profound, as it provides a foundational understanding required to facilitate the evolution of autonomous driving technology. Practically, the review underscores the necessity of collaborative efforts between academia and industry to advance dataset quality and accessibility. Theoretically, the insights derived from evaluating numerous datasets set a clear trajectory for further research focused on long-tail scenarios and sophisticated simulation techniques.

In summary, the paper provides an essential roadmap for the continued development of open-source datasets in autonomous driving. By highlighting current practices, challenges, and future needs, it serves as a critical resource for researchers and practitioners aiming to further innovate and refine autonomous technologies. Future directions could involve leveraging artificial intelligence-generated content to efficiently simulate rare driving scenarios, ensuring the robustness and reliability of autonomous systems in real-world settings.

PDF Markdown Bookmark Chat (Pro)

References (168)

Authors (19)

Hongyang Li (99 papers)
Yang Li (1142 papers)
Huijie Wang (8 papers)
Jia Zeng (45 papers)
Pinlong Cai (28 papers)
Huilin Xu (7 papers)
Dahua Lin (336 papers)
Junchi Yan (241 papers)
Feng Xu (180 papers)
Lu Xiong (23 papers)
Jingdong Wang (236 papers)
Futang Zhu (1 paper)
Chunjing Xu (66 papers)
Tiancai Wang (48 papers)
Beipeng Mu (7 papers)
Zhihui Peng (15 papers)
Yu Qiao (563 papers)
Li Chen (590 papers)
Fei Xia (111 papers)

Citations (17)

View on Semantic Scholar

GitHub

GitHub - OpenDriveLab/DriveAGI: Embracing Foundation Models into Autonomous Agent and System (732 stars)

Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future (2312.03408v4)