SurrealDriver: Designing LLM-powered Generative Driver Agent Framework based on Human Drivers' Driving-thinking Data (2309.13193v2)
Abstract: Leveraging advanced reasoning capabilities and extensive world knowledge of LLMs to construct generative agents for solving complex real-world problems is a major trend. However, LLMs inherently lack embodiment as humans, resulting in suboptimal performance in many embodied decision-making tasks. In this paper, we introduce a framework for building human-like generative driving agents using post-driving self-report driving-thinking data from human drivers as both demonstration and feedback. To capture high-quality, natural language data from drivers, we conducted urban driving experiments, recording drivers' verbalized thoughts under various conditions to serve as chain-of-thought prompts and demonstration examples for the LLM-Agent. The framework's effectiveness was evaluated through simulations and human assessments. Results indicate that incorporating expert demonstration data significantly reduced collision rates by 81.04\% and increased human likeness by 50\% compared to a baseline LLM-based agent. Our study provides insights into using natural language-based human demonstration data for embodied tasks. The driving-thinking dataset is available at \url{https://github.com/AIR-DISCOVER/Driving-Thinking-Dataset}.
- A hybrid rule-based and data-driven approach to driver modeling through particle filtering. IEEE Transactions on Intelligent Transportation Systems 23, 8 (2021), 13055–13068.
- Mimicking human driving behaviour for realistic simulation of traffic flow. International Journal of Simulation and Process Modelling 6, 2 (2010), 126–136.
- AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents. arXiv:2308.10848 [cs.CL]
- FurChat: An Embodied Conversational Agent using LLMs, Combining Open and Closed-Domain Dialogue with Facial Expressions. arXiv:2308.15214 [cs.CL]
- Exploring the limitations of behavior cloning for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9329–9338.
- An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game Agents. arXiv:2309.05076 [cs.AI]
- LLM Powered Sim-to-real Transfer for Traffic Signal Control. arXiv:2308.14284 [cs.AI]
- A cognitive based driver’s steering behavior modeling. In 2016 4th International Conference on Control, Instrumentation, and Automation (ICCIA). IEEE, 390–395.
- Ernst Dieter Dickmanns and Volker Graefe. 1988. Dynamic monocular machine vision. Machine vision and applications 1, 4 (1988), 223–240.
- CARLA: An open urban driving simulator. In Conference on robot learning. PMLR, 1–16.
- Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378 (2023).
- Style2Fab: Functionality-Aware Segmentation for Fabricating Personalized 3D Models with Generative AI. arXiv:2309.06379 [cs.HC]
- Drive Like a Human: Rethinking Autonomous Driving with Large Language Models. arXiv preprint arXiv:2307.07162 (2023).
- ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory. arXiv:2306.03901 [cs.AI]
- CGMI: Configurable General Multi-Agent Interaction Framework. arXiv:2308.12503 [cs.AI]
- Nidhi Kalra and Susan M Paddock. 2016. Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability? Transportation Research Part A: Policy and Practice 94 (2016), 182–193.
- A survey on simulators for testing self-driving cars. In 2021 Fourth International Conference on Connected and Autonomous Driving (MetroCAD). IEEE, 62–70.
- A study on an enhanced autonomous driving simulation model based on reinforcement learning using a collision prevention model. Electronics 10, 18 (2021), 2271.
- Real-time motion planning with applications to autonomous urban driving. IEEE Transactions on control systems technology 17, 5 (2009), 1105–1118.
- Camel: Communicative agents for” mind” exploration of large scale language model society. arXiv preprint arXiv:2303.17760 (2023).
- TradingGPT: Multi-Agent System with Layered Memory and Distinct Characters for Enhanced Financial Trading Performance. arXiv:2309.03736 [q-fin.PM]
- SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks. arXiv:2305.17390 [cs.CL]
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems 35 (2022), 1950–1965.
- Evaluation of Pedestrian Safety in a High-Fidelity Simulation Environment Framework. In 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 901–908.
- Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837 (2022).
- OpenAI. 2023. GPT-4 technical report. arXiv (2023), 2303–08774.
- Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442 (2023).
- Social Simulacra: Creating Populated Prototypes for Social Computing Systems. arXiv:2208.04024 [cs.HC]
- Weichao Qiu and Alan Yuille. 2016. Unrealcv: Connecting computer vision to unreal engine. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14. Springer, 909–916.
- Bernardino Romera-Paredes and Philip Torr. 2015. An embarrassingly simple approach to zero-shot learning. In International conference on machine learning. PMLR, 2152–2161.
- Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633 (2021).
- Reflexion: Language agents with verbal reinforcement learning. arXiv preprint arXiv:2303.11366 (2023).
- Behavior planning of autonomous cars with social perception. In 2019 IEEE Intelligent Vehicles Symposium (IV). IEEE, 207–213.
- Jorge Villagra and Antonio Artuñedo. 2023. Chapter 3 - Behavior planning. In Decision-Making Techniques for Autonomous Vehicles, Jorge Villagra and Felipe Jiménez (Eds.). Elsevier, 39–59. https://doi.org/10.1016/B978-0-323-98339-6.00010-5
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023).
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560 (2022).
- Interactive natural language processing. arXiv preprint arXiv:2305.13246 (2023).
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
- Unleashing the Power of Graph Learning through LLM-based Autonomous Agents. arXiv:2309.04565 [cs.LG]
- Analyzing the inconsistency in driving patterns between manual and autonomous modes under complex driving scenarios with a VR-enabled simulation platform. Journal of Intelligent and Connected Vehicles 5, 3 (2022), 215–234.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629 (2022).
- A human-like game theory-based controller for automatic lane changing. Transportation Research Part C: Emerging Technologies 88 (2018), 140–158.
- ProAgent: Building Proactive Cooperative AI with Large Language Models. arXiv:2308.11339 [cs.AI]
- ExpeL: LLM Agents Are Experiential Learners. arXiv:2308.10144 [cs.LG]