Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models (2309.16292v3)

Published 28 Sep 2023 in cs.RO and cs.CL

Abstract: Recent advancements in autonomous driving have relied on data-driven approaches, which are widely adopted but face challenges including dataset bias, overfitting, and uninterpretability. Drawing inspiration from the knowledge-driven nature of human driving, we explore the question of how to instill similar capabilities into autonomous driving systems and summarize a paradigm that integrates an interactive environment, a driver agent, as well as a memory component to address this question. Leveraging LLMs with emergent abilities, we propose the DiLu framework, which combines a Reasoning and a Reflection module to enable the system to perform decision-making based on common-sense knowledge and evolve continuously. Extensive experiments prove DiLu's capability to accumulate experience and demonstrate a significant advantage in generalization ability over reinforcement learning-based methods. Moreover, DiLu is able to directly acquire experiences from real-world datasets which highlights its potential to be deployed on practical autonomous driving systems. To the best of our knowledge, we are the first to leverage knowledge-driven capability in decision-making for autonomous vehicles. Through the proposed DiLu framework, LLM is strengthened to apply knowledge and to reason causally in the autonomous driving domain. Project page: https://pjlab-adg.github.io/DiLu/

Analysis of the Provided Arxiv Template

The document provided is a template for authors intending to submit their research papers to Arxiv. Although it does not contain substantive research content, it provides a comprehensive framework that researchers should follow to maintain consistency and quality in their submissions. This template facilitates the structured presentation of academic ideas and ensures that vital components of scholarly work are not omitted.

At its core, the document is compartmentalized into essential sections typical to scholarly articles: Introduction, Methodology, Results, and Conclusion. It further includes standardized components such as Abstract, Keywords, and Acknowledgments. The inclusion of formatted sections for figures and tables is critical for researchers whose work involves empirical analysis or simulation results, allowing for clear visual representation of data.

Salient Features of the Template

The template's structure demonstrates several key components:

  1. Abstract and Keywords: These sections are imperative for providing a summary and are fundamental for indexing and searchability in digital libraries and for other researchers to find relevant work easily.
  2. Section Headings: The document designates hierarchical level headings which are vital for clarity and navigation. The use of 'section', 'subsection', and 'subsubsection' supports logical structuring of the paper.
  3. Equation and Citation Formatting: The template includes sections on incorporating mathematical equations and citations, illustrating the capability of the document to support technical and quantitative discussions essential in technical disciplines.
  4. Figures and Tables: With placeholders for figures and tables, the template anticipates the necessity for visual aids in elucidating complex ideas and results.
  5. Lists: The inclusion of bullet lists supports concise enumeration of ideas or results and can aid in breaking down complex processes or methodologies.
  6. Bibliography: The template concludes with a standardized format for referencing, ensuring that proper credit is given and that the work is contextually situated within the existing literature.

Implications for Practice

The widespread adoption of such templates in academic publication frameworks is instrumental in shaping efficient scientific communication. It minimizes the administrative burden on authors in terms of formatting, thereby allowing them to focus on the content quality. Furthermore, it aids reviewers and readers in assimilating the information in a structured manner.

Future Developments

Looking ahead, as the academic landscape continues to evolve with technological advancements, the integration of more dynamic elements into such templates could be envisioned. This includes interactive elements for online publications, such as more sophisticated data visualizations or even executable environments for reproducibility checks.

The consistency enforced by templates like this is vital in the context of increasing interdisciplinary research; it enables different fields to align on a common standard, easing the dissemination and cross-pollination of ideas.

In conclusion, while this document may lack direct numerical or experimental insights, it is an exemplar of structured scientific communication. As the academic environment evolves, such templates will undoubtedly adapt, but their role in underpinning scholarly dissemination will remain indispensable.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Description of corner cases in automated driving: Goals and challenges. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, pp.  1023–1028, 2021.
  2. Towards corner case detection for autonomous driving. In 2019 IEEE Intelligent vehicles symposium (IV), pp. 438–445. IEEE, 2019.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  4. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  5. Milestones in autonomous driving and intelligent vehicles: Survey of surveys. IEEE Transactions on Intelligent Vehicles, 8(2):1046–1056, 2022.
  6. Milestones in autonomous driving and intelligent vehicles—part 1: Control, computing system design, communication, hd map, testing, and human behaviors. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023a.
  7. Milestones in autonomous driving and intelligent vehicles—part ii: Perception and planning. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023b.
  8. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  9. Exploring the limitations of behavior cloning for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  9329–9338, 2019.
  10. Palm-e: An embodied multimodal language model. In arXiv preprint arXiv:2303.03378, 2023a.
  11. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023b.
  12. A survey of embodied ai: From simulators to research tasks. IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022.
  13. Drive like a human: Rethinking autonomous driving with large language models. arXiv preprint arXiv:2307.07162, 2023.
  14. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:2304.15010, 2023.
  15. An application-driven conceptualization of corner cases for perception in highly automated driving. In 2021 IEEE Intelligent Vehicles Symposium (IV), pp. 644–651. IEEE, 2021.
  16. Instruct2act: Mapping multi-modality instructions to robotic actions with large language model. arXiv preprint arXiv:2305.11176, 2023a.
  17. Voxposer: Composable 3d value maps for robotic manipulation with language models. arXiv preprint arXiv:2307.05973, 2023b.
  18. Surrealdriver: Designing generative driver agent simulation framework in urban contexts based on large language model, 2023.
  19. Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3):535–547, 2019.
  20. Yann LeCun. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62, 2022.
  21. Edouard Leurent. An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env, 2018.
  22. Demystifying gpt self-repair for code generation, 2023.
  23. OpenAI. Introducing chatgpt. https://openai.com/blog/chatgpt/, 2023a.
  24. OpenAI. Gpt-4 technical report, 2023b.
  25. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  26. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442, 2023.
  27. Embodied artificial intelligence: Trends and challenges. In Embodied Artificial Intelligence: International Seminar, Dagstuhl Castle, Germany, July 7-11, 2003. Revised Papers, pp.  1–26. Springer, 2004.
  28. Nlx-gpt: A model for natural language explanations in vision and vision-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  8322–8332, 2022.
  29. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  30. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  31. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  32. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
  33. Wayve. Lingo-1: Exploring natural language for autonomous driving. https://wayve.ai/thinking/lingo-natural-language-autonomous-driving/, 2023.
  34. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
  35. Chain-of-thought prompting elicits reasoning in large language models, 2023.
  36. A graph representation for autonomous driving. In The 36th Conference on Neural Information Processing Systems Workshop, 2022.
  37. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  38. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
  39. Citysim: A drone-based vehicle trajectory dataset for safety-oriented research and digital twins. Transportation Research Record, 2023. doi: 10.1177/03611981231185768.
  40. Corner cases in data-driven automated driving: Definitions, properties and solutions. In 2023 IEEE Intelligent Vehicles Symposium (IV), pp.  1–8. IEEE, 2023.
  41. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023a.
  42. Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144, 2023b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Licheng Wen (31 papers)
  2. Daocheng Fu (22 papers)
  3. Xin Li (980 papers)
  4. Xinyu Cai (26 papers)
  5. Tao Ma (56 papers)
  6. Pinlong Cai (28 papers)
  7. Min Dou (22 papers)
  8. Botian Shi (57 papers)
  9. Liang He (202 papers)
  10. Yu Qiao (563 papers)
Citations (102)
X Twitter Logo Streamline Icon: https://streamlinehq.com