Data-Centric Evolution in Autonomous Driving: A Comprehensive Survey of Big Data System, Data Mining, and Closed-Loop Technologies (2401.12888v2)

Published 23 Jan 2024 in cs.RO and cs.CV

Abstract: The aspiration of the next generation's autonomous driving (AD) technology relies on the dedicated integration and interaction among intelligent perception, prediction, planning, and low-level control. There has been a huge bottleneck regarding the upper bound of autonomous driving algorithm performance, a consensus from academia and industry believes that the key to surmount the bottleneck lies in data-centric autonomous driving technology. Recent advancement in AD simulation, closed-loop model training, and AD big data engine have gained some valuable experience. However, there is a lack of systematic knowledge and deep understanding regarding how to build efficient data-centric AD technology for AD algorithm self-evolution and better AD big data accumulation. To fill in the identified research gaps, this article will closely focus on reviewing the state-of-the-art data-driven autonomous driving technologies, with an emphasis on the comprehensive taxonomy of autonomous driving datasets characterized by milestone generations, key features, data acquisition settings, etc. Furthermore, we provide a systematic review of the existing benchmark closed-loop AD big data pipelines from the industrial frontier, including the procedure of closed-loop frameworks, key technologies, and empirical studies. Finally, the future directions, potential applications, limitations and concerns are discussed to arouse efforts from both academia and industry for promoting the further development of autonomous driving. The project repository is available at: https://github.com/LincanLi98/Awesome-Data-Centric-Autonomous-Driving.

PDF HTML Abstract

Data-Centric Evolution in Autonomous Driving: An Analytical Synopsis

The paper presents a meticulous review of the paradigm shift towards data-centric methodologies in the field of autonomous driving (AD). Drawing on the latest advancements, the survey expounds on the integration of big data systems, data mining, and closed-loop technologies, elucidating the evolution towards a more data-driven approach in AD technology. This shift addresses the constraints associated with the algorithms' performance ceiling by pivoting towards comprehensive data-centric technologies.

The paper meticulously classifies and explores the progression of autonomous driving datasets into generational milestones, reflecting technology's rapid advancement. It emphasizes intricate details on the dataset's acquisition, settings, and key characteristics, offering a nuanced view of the landscape. For instance, the transition from the early, more basic datasets like KITTI to newer, more complex datasets such as DriveLM underscores an evolving emphasis on multi-modal data integration and enhanced scenario variety. DriveLM represents a notable example where Generative AI models utilize large-scale language and vision models to improve scenario understanding, addressing challenges like data Long-Tail Distribution and out-of-distribution detection.

Central to the paper is its examination of state-of-the-art closed-loop systems. It delineates the procedural frameworks from data collection to model deployment, typical in pioneering systems like NVIDIA's MagLev and Tesla's robust data platforms. These platforms exemplify closed-loop paradigms, incorporating comprehensive data ingestion, intelligent selection, dynamic labeling, model training, and iterative feedback through real-world deployment loops. This systematic feedback mechanism demonstrates a shift from static to dynamic model training and deployment, providing insightful implications for continued academic and industrial exploration.

Moreover, the paper explores high-fidelity data generation and simulation technologies employing generative AI, spotlighting breakthroughs like CARLA simulator and world models such as GAIA-1 and DriveDreamer. These technologies showcase a novel capability to generate realistic driving scenarios from synthetic sources, addressing the scarcity of rare and challenging driving data scenarios.

The discussion concerning auto-labeling technologies marks another focal point, emphasizing efficiency and scalability in annotating vast data volumes. The transformation from manual annotation to sophisticated auto-labeling systems, including 3D dynamic and 3D static scene labeling methodologies, reflects a crucial advancement minimizing labor-intensive processes.

The paper concludes with an articulation of the prospects and challenges ahead. It anticipates an augmentation in dataset maturity and infrastructure hardware to support expansive AI models while addressing data security and privacy concerns. Sustaining trustworthy autonomous systems through explainability and developing personalized autonomous driving recommendations based on user behavior data are underscored as future research avenues.

In essence, the paper underscores the imperative of an evolved, integrated ecosystem for autonomous driving, marrying technological sophistication with practical deployment considerations. It sets a foundational roadmap, encouraging further academic inquiry and industrial collaboration to transcend existing constraints and holistically enhance autonomous driving technologies. This progression towards a more data-centric framework in autonomous driving holds significant promise for shaping the forefront of intelligent transportation systems.

PDF Markdown Bookmark Chat (Pro)

References (65)

Authors (7)

Lincan Li (8 papers)
Wei Shao (95 papers)
Wei Dong (106 papers)
Yijun Tian (29 papers)
Kaixiang Yang (18 papers)
Wenjie Zhang (138 papers)
Qiming Zhang (31 papers)

Citations (6)

View on Semantic Scholar

GitHub

GitHub - LincanLi98/Awesome-Data-Centric-Autonomous-Driving: Awesome Data-Driven Autonomous Driving Solutions. Also the official repository of our survey paper: Data-Centric Evolution in Autonomous Driving: A Comprehensive Survey of Big Data System, Data Mining, and Closed-Loop Technologies (129 stars)

Tweets

https://twitter.com/kaz_photon/status/1835498422276952169

https://twitter.com/gm8xx8/status/1749978837578883506

Data-Centric Evolution in Autonomous Driving: A Comprehensive Survey of Big Data System, Data Mining, and Closed-Loop Technologies (2401.12888v2)

Data-Centric Evolution in Autonomous Driving: An Analytical Synopsis

Related Papers

GitHub

Tweets