V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models (2502.09980v3)

Published 14 Feb 2025 in cs.CV and cs.RO

Abstract: Current autonomous driving vehicles rely mainly on their individual sensors to understand surrounding scenes and plan for future trajectories, which can be unreliable when the sensors are malfunctioning or occluded. To address this problem, cooperative perception methods via vehicle-to-vehicle (V2V) communication have been proposed, but they have tended to focus on perception tasks like detection or tracking. How those approaches contribute to overall cooperative planning performance is still under-explored. Inspired by recent progress using LLMs to build autonomous driving systems, we propose a novel problem setting that integrates a Multi-Modal LLM into cooperative autonomous driving, with the proposed Vehicle-to-Vehicle Question-Answering (V2V-QA) dataset and benchmark. We also propose our baseline method Vehicle-to-Vehicle Multi-Modal LLM (V2V-LLM), which uses an LLM to fuse perception information from multiple connected autonomous vehicles (CAVs) and answer various types of driving-related questions: grounding, notable object identification, and planning. Experimental results show that our proposed V2V-LLM can be a promising unified model architecture for performing various tasks in cooperative autonomous driving, and outperforms other baseline methods that use different fusion approaches. Our work also creates a new research direction that can improve the safety of future autonomous driving systems. The code and data will be released to the public to facilitate open-source research in this field. Our project website: https://eddyhkchiu.github.io/v2vLLM.github.io/ .

PDF Abstract

Analyzing Vehicle-to-Vehicle Cooperative Autonomous Driving via LLMs

The paper "V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal LLMs" introduces a novel approach to enhancing autonomous driving systems through cooperative perception and planning using multi-modal LLMs. The work addresses a significant limitation in current autonomous vehicles: the dependency on individual sensor systems which can be unreliable due to occlusions or sensor failures. By leveraging vehicle-to-vehicle (V2V) communication, this paper proposes an integrated LLM solution for cooperative driving scenarios, showcasing a promising direction for improving vehicular safety and efficiency through shared perception and intelligence.

Key Contributions and Methods

Problem Setting with V2V-LLM: The research pioneers the integration of LLMs into cooperative autonomous driving tasks. A novel V2V-LLM model is introduced, which facilitates the fusion of data sourced from multiple Connected Autonomous Vehicles (CAVs). Each CAV can query the LLM regarding its surroundings, aiding in driving-related decision-making processes such as route planning and obstacle detection.
V2V-QA Dataset: A significant highlight of the work is the introduction of the Vehicle-to-Vehicle Question-Answering (V2V-QA) benchmark dataset. This dataset is designed to evaluate various tasks critical to cooperative driving, including grounding, notable object identification, and planning. The dataset builds upon real-world scenarios from the V2V4Real cooperative perception dataset, offering robust scenarios for testing LLM capabilities in understanding and acting upon multi-agent perception data.
Baseline and Comparative Analysis: The V2V-LLM model's performance is rigorously compared against several baseline fusion methods such as no fusion, early fusion, and intermediate fusion techniques including AttFuse, V2X-ViT, and CoBEVT. The experimental results demonstrate V2V-LLM's proficiency in outperforming these methods, particularly in planning and object identification tasks, which are deemed critical for the efficacy of autonomous driving systems.

Numerical Findings

The V2V-LLM model achieves notable success in notable object identification and planning tasks, outperforming traditional fusion methods. Specifically, in planning tasks, the model records an average L2 distance error of 4.99 meters and a collision rate of 3.00%, which marks an improvement over early fusion methods with a 6.20-meter L2 distance error and 3.55% collision rate.
Communication costs are maintained at levels comparable to intermediate fusion methods while leveraging both scene-level and object-level feature inputs, showcasing the model's communication efficiency.

Implications and Future Directions

Practically, the integration of LLMs into autonomous driving systems as demonstrated by V2V-LLM suggests significant potential for reducing traffic accidents through enhanced situational awareness and decision-making capabilities. Theoretically, this research enriches the dialogue on multi-modal use of LLMs and their applicability beyond natural language processing, extending their use into complex real-time applications like cooperative autonomous driving.

The findings point toward several future research avenues, including the incorporation of high-definition map data to improve trajectory planning and the exploration of real-time LLM adaptations for diverse and dynamic traffic environments. Moreover, the approach opens possibilities for further advancements in the fields of intelligent transportation systems and distributed AI frameworks.

In summary, this work demonstrates an innovative application of LLMs in autonomous vehicle technology, reinforcing the potential for cooperative vehicle systems to fundamentally enhance safety and functionality in autonomous driving.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Hsu-kuang Chiu (9 papers)
Ryo Hachiuma (24 papers)
Chien-Yi Wang (29 papers)
Stephen F. Smith (12 papers)
Yu-Chiang Frank Wang (88 papers)
Min-Hung Chen (41 papers)

Related Papers

Find Related Papers

GitHub

Tweets

https://twitter.com/jbohnslav/status/1892215184216805883