Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AgentsCoDriver: Large Language Model Empowered Collaborative Driving with Lifelong Learning (2404.06345v2)

Published 9 Apr 2024 in cs.AI and cs.RO

Abstract: Connected and autonomous driving is developing rapidly in recent years. However, current autonomous driving systems, which are primarily based on data-driven approaches, exhibit deficiencies in interpretability, generalization, and continuing learning capabilities. In addition, the single-vehicle autonomous driving systems lack of the ability of collaboration and negotiation with other vehicles, which is crucial for the safety and efficiency of autonomous driving systems. In order to address these issues, we leverage LLMs to develop a novel framework, AgentsCoDriver, to enable multiple vehicles to conduct collaborative driving. AgentsCoDriver consists of five modules: observation module, reasoning engine, cognitive memory module, reinforcement reflection module, and communication module. It can accumulate knowledge, lessons, and experiences over time by continuously interacting with the environment, thereby making itself capable of lifelong learning. In addition, by leveraging the communication module, different agents can exchange information and realize negotiation and collaboration in complex traffic environments. Extensive experiments are conducted and show the superiority of AgentsCoDriver.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information fusion, 58:82–115, 2020.
  2. Explainable artificial intelligence for autonomous driving: A comprehensive overview and field guide for future research directions. arXiv preprint arXiv:2112.11561, 2021.
  3. Fine-tuning language models to find agreement among humans with diverse preferences. Advances in Neural Information Processing Systems, 35:38176–38189, 2022.
  4. Deep Multi Agent Reinforcement Learning for Autonomous Driving. In Goutte, C. and Zhu, X. (eds.), Advances in Artificial Intelligence, volume 12109, pp.  67–78. Springer International Publishing, Cham, 2020. ISBN 978-3-030-47357-0 978-3-030-47358-7. doi: 10.1007/978-3-030-47358-7˙7. Series Title: Lecture Notes in Computer Science.
  5. Language (technology) is power: A critical survey of” bias” in nlp. arXiv preprint arXiv:2005.14050, 2020.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. Sparks of artificial general intelligence: Early experiments with gpt-4, 2023.
  8. Driving with llms: Fusing object-level vector modality for explainable autonomous driving. arXiv preprint arXiv:2310.01957, 2023a.
  9. F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3D point clouds. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing, pp.  88–100, 2019.
  10. Vehicle as a Service (VaaS): Leverage Vehicles to Build Service Networks and Capabilities for Smart Cities, April 2023b. arXiv:2304.11397 [cs].
  11. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
  12. Receive, reason, and react: Drive as you say with large language models in autonomous vehicles. arXiv preprint arXiv:2310.08034, 2023.
  13. Hilm-d: Towards high-resolution understanding in multimodal large language models for autonomous driving. arXiv preprint arXiv:2309.05186, 2023.
  14. Can an embodied agent find your” cat-shaped mug”? llm-based zero-shot object navigation. arXiv preprint arXiv:2303.03480, 2023.
  15. PACS: Priority-Aware Collaborative Sensing for Connected and Autonomous Vehicles. URL https://shorturl.at/pNZ13.
  16. Drive like a human: Rethinking autonomous driving with large language models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  910–919, 2024.
  17. Explainable ai: current status and future directions. arXiv preprint arXiv:2107.07045, 2021.
  18. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. In ICLR’24. arXiv, November 2023. arXiv:2308.00352 [cs].
  19. Adaptive Communications in Collaborative Perception with Domain Alignment for Autonomous Driving, October 2023a. arXiv:2310.00013 [cs].
  20. Towards Full-scene Domain Generalization in Multi-agent Collaborative Bird’s Eye View Segmentation for Connected and Autonomous Driving, November 2023b. arXiv:2311.16754 [cs].
  21. Collaborative perception for connected and autonomous driving: Challenges, possible solutions and opportunities, 2024.
  22. Where2comm: Communication-efficient collaborative perception via spatial confidence maps. Advances in neural information processing systems, 35:4874–4886, 2022.
  23. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022.
  24. Reactive nonholonomic trajectory generation via parametric optimal control. The International Journal of Robotics Research, 22(7-8):583–601, 2003.
  25. Leurent, E. An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env, 2018.
  26. CAMEL: Communicative Agents for ”Mind” Exploration of Large Scale Language Model Society. In NeurIPS’23. arXiv, March 2023. arXiv:2303.17760 [cs].
  27. V2X-Sim: Multi-agent collaborative perception dataset and benchmark for autonomous driving. IEEE Robotics and Automation Letters, 7(4):10914–10921, 2022.
  28. Fedsn: A general federated learning framework over leo satellite networks. arXiv preprint arXiv:2311.01483, 2023a.
  29. Pushing Large Language Models to the 6G Edge: Vision, Challenges, and Opportunities, September 2023b. arXiv:2309.16739 [cs].
  30. Adaptsfl: Adaptive split federated learning in resource-constrained edge networks. arXiv preprint arXiv:2403.13101, 2024a.
  31. Efficient parallel split learning over resource-constrained wireless edge networks. IEEE Transactions on Mobile Computing, 2024b.
  32. Mtd-gpt: A multi-task decision-making gpt model for autonomous driving at unsignalized intersections. arXiv preprint arXiv:2307.16118, 2023a.
  33. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023b.
  34. GPT-Driver: Learning to Drive with GPT, October 2023a. arXiv:2310.01415 [cs].
  35. A language agent for autonomous driving. arXiv preprint arXiv:2311.10813, 2023b.
  36. A Language Agent for Autonomous Driving, November 2023c. arXiv:2311.10813 [cs].
  37. DiscoNet: Shapes learning on disconnected manifolds for 3D editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  3474–3483, 2019.
  38. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  39. Palanisamy, P. Multi-Agent Connected Autonomous Driving using Deep Reinforcement Learning. In 2020 International Joint Conference on Neural Networks (IJCNN), pp.  1–7, Glasgow, United Kingdom, July 2020. IEEE. ISBN 978-1-72816-926-2. doi: 10.1109/IJCNN48605.2020.9207663.
  40. Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario. arXiv preprint arXiv:2305.14836, 2023.
  41. Improving language understanding by generative pre-training. 2018.
  42. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  43. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
  44. Sanderson, K. Gpt-4 is here: what scientists think. Nature, 615(7954):773, 2023.
  45. LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving, October 2023. arXiv:2310.03026 [cs].
  46. Navigation with large language models: Semantic guesswork as a heuristic for planning. In Conference on Robot Learning, pp.  2683–2699. PMLR, 2023.
  47. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving, October 2016. arXiv:1610.03295 [cs, stat].
  48. Societal biases in language generation: Progress and challenges. arXiv preprint arXiv:2105.04054, 2021.
  49. Recent advancements in end-to-end autonomous driving using deep learning: A survey. arXiv e-prints, pp.  arXiv–2307, 2023.
  50. Preference ranking optimization for human alignment. arXiv preprint arXiv:2306.17492, 2023.
  51. V2Vnet: Vehicle-to-vehicle communication for joint perception and prediction. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pp.  605–621. Springer, 2020.
  52. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
  53. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
  54. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  55. DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models. arXiv, October 2023. arXiv:2309.16292 [cs].
  56. Fundamental limitations of alignment in large language models. arXiv preprint arXiv:2304.11082, 2023.
  57. Language prompt for autonomous driving. arXiv preprint arXiv:2309.04379, 2023.
  58. OPV2V: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In 2022 International Conference on Robotics and Automation (ICRA), pp.  2583–2589. IEEE, 2022.
  59. DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model, October 2023. arXiv:2310.01412 [cs].
  60. DAIR-V2X: A large-scale dataset for vehicle-infrastructure cooperative 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  21361–21370, 2022.
  61. Rethinking closed-loop training for autonomous driving. In European Conference on Computer Vision, pp.  264–282. Springer, 2022a.
  62. Building Cooperative Embodied Agents Modularly with Large Language Models. In ICLR’24. arXiv, July 2023. arXiv:2307.02485 [cs].
  63. Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493, 2022b.
  64. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
  65. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Senkang Hu (17 papers)
  2. Zhengru Fang (20 papers)
  3. Zihan Fang (17 papers)
  4. Xianhao Chen (50 papers)
  5. Yuguang Fang (55 papers)
  6. Yiqin Deng (22 papers)
Citations (15)
X Twitter Logo Streamline Icon: https://streamlinehq.com