Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving (2312.03661v3)

Published 6 Dec 2023 in cs.CV

Abstract: Large vision-LLMs (VLMs) have garnered increasing interest in autonomous driving areas, due to their advanced capabilities in complex reasoning tasks essential for highly autonomous vehicle behavior. Despite their potential, research in autonomous systems is hindered by the lack of datasets with annotated reasoning chains that explain the decision-making processes in driving. To bridge this gap, we present Reason2Drive, a benchmark dataset with over 600K video-text pairs, aimed at facilitating the study of interpretable reasoning in complex driving environments. We distinctly characterize the autonomous driving process as a sequential combination of perception, prediction, and reasoning steps, and the question-answer pairs are automatically collected from a diverse range of open-source outdoor driving datasets, including nuScenes, Waymo and ONCE. Moreover, we introduce a novel aggregated evaluation metric to assess chain-based reasoning performance in autonomous systems, addressing the semantic ambiguities of existing metrics such as BLEU and CIDEr. Based on the proposed benchmark, we conduct experiments to assess various existing VLMs, revealing insights into their reasoning capabilities. Additionally, we develop an efficient approach to empower VLMs to leverage object-level perceptual elements in both feature extraction and prediction, further enhancing their reasoning accuracy. The code and dataset will be released.

PDF Abstract

Interpretable Chain-based Reasoning in Autonomous Driving: An Examination of Reason2Drive

The paper "Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving" proposes an expanded scope in the field of autonomous driving, emphasizing the need for interpretable reasoning models using large vision-LLMs (VLMs). The paper introduces Reason2Drive, a benchmark dataset consisting of over 600,000 video-text pairs, aiming to facilitate the development and evaluation of VLMs' reasoning capabilities in complex driving environments. This dataset is constructed from a combination of public driving datasets, namely nuScenes, Waymo, and ONCE.

Dataset Construction and Characteristics

Reason2Drive addresses a critical gap in current autonomous driving research: the scarcity of datasets that offer annotated reasoning chains elucidating decision-making processes. The dataset is structured to reflect the sequential stages—perception, prediction, and reasoning—involved in autonomous driving. These stages enable a comprehensive understanding of object-level and scenario-level tasks within this domain. For instance, perception tasks focus on the identification of objects and their attributes, prediction tasks involve forecasting future states, and reasoning tasks explore complex decision-making processes that incorporate both perceptual inputs and predictions.

One of the distinctive features of Reason2Drive is the use of both automated and manual annotation, incorporating GPT-4 to ensure the diversity and accuracy of question-answer pairs. The dataset is notably larger and more intricate in terms of reasoning chains compared to previous benchmarks, which often oversimplified driving into straightforward question-answer formats with limited scope.

Evaluation Protocols

The paper introduces a novel aggregated evaluation metric tailored to chain-based reasoning, addressing the limitations of existing metrics such as BLEU and CIDEr, which are prone to semantic ambiguity. The proposed metric assesses the alignment and accuracy of reasoning chains by considering factors such as reasoning alignment, redundancy, and missing steps, along with a strict reasoning metric that evaluates the spatial accuracy of VLMs' predictions.

Experimental Insights

The authors conducted experiments using various existing VLMs to evaluate their reasoning capabilities on the Reason2Drive dataset. They discovered that many models struggled to effectively leverage perceptual priors, resulting in insufficient reasoning performance. To address this, the authors present an efficient framework that enhances VLMs through the inclusion of a prior tokenizer and an instructed vision decoder. This aims to improve the models' ability to extract and utilize object-level perceptual features, thus boosting reasoning accuracy.

Empirical results demonstrated that the proposed framework significantly outperformed baseline models in reasoning tasks, showcasing its potential in generating accurate perceptual and reasoning outputs. Additionally, the framework exhibited impressive generalization capabilities across unseen scenarios, emphasizing its robustness and practicality for real-world autonomous driving applications.

Implications and Future Directions

The introduction of Reason2Drive and its associated methodologies holds substantial implications for the field of autonomous driving. By providing a comprehensive benchmark that encompasses the full decision-making process, researchers are now equipped to develop and fine-tune VLMs capable of interpretable and reliable reasoning. This could enhance the transparency and accountability of autonomous systems, reducing reliance on empirical rules and improving diagnostics of decision failures.

Looking ahead, the paper suggests future research may focus on optimizing VLM architectures to better integrate perceptual features, refine evaluation metrics, and explore chain-based reasoning in more diverse and complex driving environments. The proposed framework and dataset offer a promising foundation for these pursuits, paving the way for advancements in autonomous system interpretability and decision-making efficacy.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Ming Nie (5 papers)
Renyuan Peng (3 papers)
Chunwei Wang (13 papers)
Xinyue Cai (12 papers)
Jianhua Han (49 papers)
Hang Xu (204 papers)
Li Zhang (690 papers)

Citations (25)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - fudan-zvg/Reason2Drive: Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving (87 stars)