PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning (2406.01587v2)

Published 3 Jun 2024 in cs.RO

Abstract: Vehicle motion planning is an essential component of autonomous driving technology. Current rule-based vehicle motion planning methods perform satisfactorily in common scenarios but struggle to generalize to long-tailed situations. Meanwhile, learning-based methods have yet to achieve superior performance over rule-based approaches in large-scale closed-loop scenarios. To address these issues, we propose PlanAgent, the first mid-to-mid planning system based on a Multi-modal LLM (MLLM). MLLM is used as a cognitive agent to introduce human-like knowledge, interpretability, and common-sense reasoning into the closed-loop planning. Specifically, PlanAgent leverages the power of MLLM through three core modules. First, an Environment Transformation module constructs a Bird's Eye View (BEV) map and a lane-graph-based textual description from the environment as inputs. Second, a Reasoning Engine module introduces a hierarchical chain-of-thought from scene understanding to lateral and longitudinal motion instructions, culminating in planner code generation. Last, a Reflection module is integrated to simulate and evaluate the generated planner for reducing MLLM's uncertainty. PlanAgent is endowed with the common-sense reasoning and generalization capability of MLLM, which empowers it to effectively tackle both common and complex long-tailed scenarios. Our proposed PlanAgent is evaluated on the large-scale and challenging nuPlan benchmarks. A comprehensive set of experiments convincingly demonstrates that PlanAgent outperforms the existing state-of-the-art in the closed-loop motion planning task. Codes will be soon released.

Authors (11)

Yupeng Zheng (18 papers)
Zebin Xing (6 papers)
Qichao Zhang (27 papers)
Bu Jin (14 papers)
Pengfei Li (185 papers)
Yuhang Zheng (17 papers)
Zhongpu Xia (7 papers)
Kun Zhan (38 papers)
Xianpeng Lang (19 papers)
Yaran Chen (23 papers)
Dongbin Zhao (62 papers)

Citations (8)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning (2406.01587v2)

Summary

Related Papers