LLM4Drive: A Survey of Large Language Models for Autonomous Driving (2311.01043v4)

Published 2 Nov 2023 in cs.AI

Abstract: Autonomous driving technology, a catalyst for revolutionizing transportation and urban mobility, has the tend to transition from rule-based systems to data-driven strategies. Traditional module-based systems are constrained by cumulative errors among cascaded modules and inflexible pre-set rules. In contrast, end-to-end autonomous driving systems have the potential to avoid error accumulation due to their fully data-driven training process, although they often lack transparency due to their "black box" nature, complicating the validation and traceability of decisions. Recently, LLMs have demonstrated abilities including understanding context, logical reasoning, and generating answers. A natural thought is to utilize these abilities to empower autonomous driving. By combining LLM with foundation vision models, it could open the door to open-world understanding, reasoning, and few-shot learning, which current autonomous driving systems are lacking. In this paper, we systematically review a research line about \textit{LLMs for Autonomous Driving (LLM4AD)}. This study evaluates the current state of technological advancements, distinctly outlining the principal challenges and prospective directions for the field. For the convenience of researchers in academia and industry, we provide real-time updates on the latest advances in the field as well as relevant open-source resources via the designated link: https://github.com/Thinklab-SJTU/Awesome-LLM4AD.

PDF HTML Abstract

Overview of LLM4Drive: A Survey of LLMs for Autonomous Driving

The paper "LLM4Drive: A Survey of LLMs for Autonomous Driving" by Yang et al., provides a comprehensive examination of leveraging LLMs to enhance autonomous driving (AD). The research systematically reviews the potential integration of LLMs into AD systems, addressing technological advancements, challenges, and future directions.

Key Insights and Contributions

The paper underscores a pivotal shift from traditional module-based systems to data-driven autonomous driving solutions. However, these end-to-end systems often exhibit a lack of decision transparency due to their "black box" nature. The introduction of LLMs into autonomous driving systems could potentially bridge this gap by improving decision-making, context understanding, and reasoning capabilities.

The authors categorize the LLM integration into autonomous driving into four primary areas:

Planning and Control: LLMs can enhance vehicle decision-making processes, with approaches classified into fine-tuning pre-trained models and prompt engineering. These include comprehensive techniques like DriveMLM and LMDrive that leverage multi-modal inputs to generate high-level decision commands.
Perception: By incorporating LLMs, there is an expected enhancement in tasks such as prediction, detection, and tracking. For example, HiLM-D integrates high-resolution information for risk object localization, demonstrating LLMs’ potential to elevate the perception capability in dynamic environments.
Question Answering (QA): LLMs contribute significantly to QA systems by providing in-depth scene interpretation and decision rationalization. These capabilities are crucial for human-centric systems where understanding and interaction are key focus areas.
Generation: The application of diffusion models to generate realistic datasets provides an avenue for creating synthetic driving scenarios under various conditions. This can serve as a resource for testing and validation, reducing data collection and annotation costs.

Implications and Future Directions

The integration of LLMs in autonomous driving is poised to offer several theoretical and practical advancements. Theoretically, the ability of LLMs to process multi-modal data and generate coherent responses enhances the overall understanding and interpretation of driving situations. Practically, these models can improve safety, efficiency, and the adaptability of autonomous vehicles to new environments.

The work also highlights the importance of datasets suited for LLM applications in autonomous driving. The exploration of datasets like NuScenes-QA and Reason2Drive expands the scope of LLM4AD by providing intricate driving scenarios and QA pairs essential for training and evaluation.

For future developments, continuous advancements in LLM architectures and their training paradigms hold promise for enhanced performance in AD tasks. The potential for LLMs to address the "long-tail problem" in perception and decision-making remains a critical area for ongoing research.

In conclusion, the survey presented in this paper provides a pivotal understanding of where and how LLMs can be integrated into the autonomous driving pipeline. While challenges such as model interpretability and ethical considerations persist, the intersection of LLMs and autonomous driving offers compelling avenues for innovation and improvement within the domain.

PDF Markdown Bookmark Chat (Pro)

References (130)

Authors (4)

Zhenjie Yang (7 papers)
Xiaosong Jia (21 papers)
Hongyang Li (99 papers)
Junchi Yan (241 papers)

Citations (53)

View on Semantic Scholar

GitHub

GitHub - Thinklab-SJTU/Awesome-LLM4AD: A curated list of awesome LLM for Autonomous Driving resources (continually updated) (678 stars)

YouTube

Show All Videos