DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving (2409.18053v3)

Published 26 Sep 2024 in cs.RO and cs.AI

Abstract: We present a novel autonomous driving framework, DualAD, designed to imitate human reasoning during driving. DualAD comprises two layers: a rule-based motion planner at the bottom layer that handles routine driving tasks requiring minimal reasoning, and an upper layer featuring a rule-based text encoder that converts driving scenarios from absolute states into text description. This text is then processed by a LLM to make driving decisions. The upper layer intervenes in the bottom layer's decisions when potential danger is detected, mimicking human reasoning in critical situations. Closed-loop experiments demonstrate that DualAD, using a zero-shot pre-trained model, significantly outperforms rule-based motion planners that lack reasoning abilities. Our experiments also highlight the effectiveness of the text encoder, which considerably enhances the model's scenario understanding. Additionally, the integrated DualAD model improves with stronger LLMs, indicating the framework's potential for further enhancement. Code and benchmarks are available at github.com/TUM-AVS/DualAD.

Summary

The paper introduces a dual-layer framework that combines rule-based motion planning with LLM reasoning to enhance decision-making in complex driving scenarios.
It details a method where driving situations are encoded as text for LLM interpretation, enabling adaptive high-level decisions in critical conditions.
Experimental results on the nuPlan dataset demonstrate a 44% improvement in reactive closed-loop scores, highlighting robust performance in challenging scenarios.

DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving

The paper "DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving" by Dingrui Wang, Marc Kaufeld, and Johannes Betz proposes a novel framework designed to enhance reasoning capabilities in autonomous driving (AD) systems. This dual-layer framework aims to replicate human cognitive processes during driving by integrating a rule-based motion planner with a LLM for high-level reasoning.

Overview

DualAD is structured into two core layers: a lower layer that executes reference path and motion planning through rule-based methods, and an upper layer dedicated to assessing potential dangers and modifying driving behavior based on a deep semantic understanding of the driving environment. The use of LLMs in this upper layer allows superior reasoning and decision-making, particularly in critical or risky scenarios where traditional rule-based planners may falter.

Methodology

The architecture of DualAD leverages both conventional rule-based planners and LLMs to mimic the human approach to driving. The lower layer uses rule-based planners for routine, straightforward driving tasks. In contrast, the upper layer dynamically interprets driving scenarios through a rule-based text encoder that translates the environment's states into textual descriptions. These descriptions are processed by an LLM to make informed driving decisions, such as adjusting speed or applying hard braking in perilous situations.

The primary components of DualAD include:

Text Encoder: Converts driving scenarios into textual descriptions. These descriptions allow the LLM to better understand the context of the scenario, effectively enhancing its reasoning capabilities.
Rule-based Motion Planner: Utilizes models like the Intelligent Driver Model (IDM), Lattice Planner, and Frenetix Planner to perform path planning and motion control.
LLM Integration: Processes text descriptions to output high-level commands that can overrule the decisions made by the rule-based motion planner when deemed necessary.

Experimental Setup

The efficacy of DualAD was evaluated using the nuPlan dataset, which provides a comprehensive array of driving scenarios in both reactive and non-reactive modes. Metrics such as the non-reactive closed-loop score (NR-CLS) and reactive closed-loop score (R-CLS) were employed to measure performance improvements.

Different LLMs were tested, including GLM-4-Flash and GPT-4o, to understand how varying levels of reasoning from these models impact the overall system performance. Notably, even without specific training on driving tasks, the LLMs significantly enhanced the performance of DualAD, particularly in challenging scenarios.

Results

The experimental results are compelling and reveal several key insights:

Enhanced Performance: DualAD demonstrated significant improvements over traditional rule-based planners. For instance, integrating Lattice-IDM with LLMs led to a 44% improvement in R-CLS scores in the Hard-55 benchmark.
Robustness in Complex Scenarios: The framework showed notable resilience and effectiveness in handling complex and critical scenarios, outperforming state-of-the-art planners like PDM-Closed and UrbanDriver in certain benchmarks.
Influence of LLM Quality: Results varied with the strength and capabilities of the LLM used. GPT-4o, a more advanced LLM, provided superior performance and stability compared to GLM-4-Flash.

Discussion and Implications

DualAD represents an innovative approach to integrating high-level reasoning into autonomous driving systems. The use of LLMs enables a deeper and more nuanced understanding of driving environments, paving the way for safer and more reliable AD systems. Moreover, the flexibility of this framework allows it to incorporate continually improving LLMs, ensuring future advancements in both model capabilities and overall performance.

One of the key implications of this research is the potential for significant reductions in computational burden. By using LLMs for high-level reasoning akin to human cognitive processes, the system can achieve safer driving outcomes without the intensive computational requirements typically associated with purely data-driven methods.

Future Work

While DualAD has shown notable promise, several areas for future research remain. These include:

Incorporating Map Information: Integrating detailed map information such as lane positions and road types into the text descriptions could enhance the LLM's understanding and decision-making.
Expanding Decision Scope: Currently, the LLM focuses primarily on velocity adjustments. Incorporating decisions related to trajectory and steering could further leverage the LLM's reasoning abilities.
Temporal Context: Processing multiple frames to include temporal context could provide the LLM with historical data, further improving scenario understanding and decision-making accuracy.

Conclusion

The DualAD framework effectively augments rule-based planning with the reasoning capabilities of LLMs, mimicking human cognitive flexibility in autonomous driving. The compelling results achieved in handling critical driving scenarios highlight the potential of integrating LLMs into AD systems, offering a pathway to more intelligent, efficient, and safer autonomous driving technologies. As future developments in LLMs emerge, DualAD is poised to continually benefit from these advancements, underscoring its long-term viability and impact in the field of autonomous driving.

PDF Markdown

Related Papers

Tweets

https://twitter.com/OWW/status/1839872179334164616

YouTube

Show All Videos