DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models (2309.16292v3)
Abstract: Recent advancements in autonomous driving have relied on data-driven approaches, which are widely adopted but face challenges including dataset bias, overfitting, and uninterpretability. Drawing inspiration from the knowledge-driven nature of human driving, we explore the question of how to instill similar capabilities into autonomous driving systems and summarize a paradigm that integrates an interactive environment, a driver agent, as well as a memory component to address this question. Leveraging LLMs with emergent abilities, we propose the DiLu framework, which combines a Reasoning and a Reflection module to enable the system to perform decision-making based on common-sense knowledge and evolve continuously. Extensive experiments prove DiLu's capability to accumulate experience and demonstrate a significant advantage in generalization ability over reinforcement learning-based methods. Moreover, DiLu is able to directly acquire experiences from real-world datasets which highlights its potential to be deployed on practical autonomous driving systems. To the best of our knowledge, we are the first to leverage knowledge-driven capability in decision-making for autonomous vehicles. Through the proposed DiLu framework, LLM is strengthened to apply knowledge and to reason causally in the autonomous driving domain. Project page: https://pjlab-adg.github.io/DiLu/
- Description of corner cases in automated driving: Goals and challenges. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, pp. 1023–1028, 2021.
- Towards corner case detection for autonomous driving. In 2019 IEEE Intelligent vehicles symposium (IV), pp. 438–445. IEEE, 2019.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
- Milestones in autonomous driving and intelligent vehicles: Survey of surveys. IEEE Transactions on Intelligent Vehicles, 8(2):1046–1056, 2022.
- Milestones in autonomous driving and intelligent vehicles—part 1: Control, computing system design, communication, hd map, testing, and human behaviors. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023a.
- Milestones in autonomous driving and intelligent vehicles—part ii: Perception and planning. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023b.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Exploring the limitations of behavior cloning for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9329–9338, 2019.
- Palm-e: An embodied multimodal language model. In arXiv preprint arXiv:2303.03378, 2023a.
- Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023b.
- A survey of embodied ai: From simulators to research tasks. IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022.
- Drive like a human: Rethinking autonomous driving with large language models. arXiv preprint arXiv:2307.07162, 2023.
- Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:2304.15010, 2023.
- An application-driven conceptualization of corner cases for perception in highly automated driving. In 2021 IEEE Intelligent Vehicles Symposium (IV), pp. 644–651. IEEE, 2021.
- Instruct2act: Mapping multi-modality instructions to robotic actions with large language model. arXiv preprint arXiv:2305.11176, 2023a.
- Voxposer: Composable 3d value maps for robotic manipulation with language models. arXiv preprint arXiv:2307.05973, 2023b.
- Surrealdriver: Designing generative driver agent simulation framework in urban contexts based on large language model, 2023.
- Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3):535–547, 2019.
- Yann LeCun. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62, 2022.
- Edouard Leurent. An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env, 2018.
- Demystifying gpt self-repair for code generation, 2023.
- OpenAI. Introducing chatgpt. https://openai.com/blog/chatgpt/, 2023a.
- OpenAI. Gpt-4 technical report, 2023b.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442, 2023.
- Embodied artificial intelligence: Trends and challenges. In Embodied Artificial Intelligence: International Seminar, Dagstuhl Castle, Germany, July 7-11, 2003. Revised Papers, pp. 1–26. Springer, 2004.
- Nlx-gpt: A model for natural language explanations in vision and vision-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8322–8332, 2022.
- Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
- Wayve. Lingo-1: Exploring natural language for autonomous driving. https://wayve.ai/thinking/lingo-natural-language-autonomous-driving/, 2023.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
- Chain-of-thought prompting elicits reasoning in large language models, 2023.
- A graph representation for autonomous driving. In The 36th Conference on Neural Information Processing Systems Workshop, 2022.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
- A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
- Citysim: A drone-based vehicle trajectory dataset for safety-oriented research and digital twins. Transportation Research Record, 2023. doi: 10.1177/03611981231185768.
- Corner cases in data-driven automated driving: Definitions, properties and solutions. In 2023 IEEE Intelligent Vehicles Symposium (IV), pp. 1–8. IEEE, 2023.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023a.
- Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144, 2023b.
- Licheng Wen (31 papers)
- Daocheng Fu (22 papers)
- Xin Li (980 papers)
- Xinyu Cai (26 papers)
- Tao Ma (56 papers)
- Pinlong Cai (28 papers)
- Min Dou (22 papers)
- Botian Shi (57 papers)
- Liang He (202 papers)
- Yu Qiao (563 papers)