Combining Planning and Deep Reinforcement Learning in Tactical Decision Making for Autonomous Driving (1905.02680v1)

Published 6 May 2019 in cs.RO, cs.AI, and cs.LG

Abstract: Tactical decision making for autonomous driving is challenging due to the diversity of environments, the uncertainty in the sensor information, and the complex interaction with other road users. This paper introduces a general framework for tactical decision making, which combines the concepts of planning and learning, in the form of Monte Carlo tree search and deep reinforcement learning. The method is based on the AlphaGo Zero algorithm, which is extended to a domain with a continuous state space where self-play cannot be used. The framework is applied to two different highway driving cases in a simulated environment and it is shown to perform better than a commonly used baseline method. The strength of combining planning and learning is also illustrated by a comparison to using the Monte Carlo tree search or the neural network policy separately.

Citations (220)

View on Semantic Scholar

Summary

The paper introduces a novel framework that fuses planning (MCTS) with deep reinforcement learning to achieve efficient real-time tactical decision making in autonomous driving.
It employs a neural network to estimate state values and action probabilities while using progressive widening to manage continuous state spaces effectively.
Comparative evaluations in highway and exit scenarios demonstrate that the proposed agent outperforms baseline MCTS and rule-based models in success rate and sample efficiency.

Combining Planning and Deep Reinforcement Learning in Tactical Decision Making for Autonomous Driving

This paper addresses a critical challenge in autonomous vehicle operation: tactical decision making. The research presents a novel framework integrating planning and learning methods, specifically Monte Carlo Tree Search (MCTS) and Deep Reinforcement Learning (DRL), within the context of autonomous driving. The authors leverage the AlphaGo Zero algorithm, extending it into a domain with continuous state spaces and scenarios where self-play is inapplicable—key considerations for real-world driving environments.

Framework Overview

The proposed framework merges MCTS and a neural network-based approach to create a versatile decision-making agent capable of realtime tactical decisions under uncertainty. MCTS is utilized to explore the state space effectively, while the neural network guides this exploration by estimating both the value of states and the probability distribution over actions. This dual mechanism aims to improve the search efficiency by focusing on the most promising regions of the decision space.

Notably, the framework replaces standard MCTS rollouts with simulations drawn from a generative model, fitting the continuous and partially observable nature of driving environments. Progressive widening, a technique for managing continuous state spaces, is employed to balance exploration and exploitation within the MCTS process.

Numerical Results and Evaluation

The paper evaluates the framework through simulation in two primary scenarios: continuous highway driving and highway exits. These test cases are designed to represent distinct tactical decision-making challenges. The proposed MCTS/NN agent was compared against baseline methods, including standard MCTS and rule-based IDM/MOBIL models. Crucially, the MCTS/NN agent demonstrated superior performance across both scenarios, particularly excelling in complex situations requiring anticipatory planning, evidenced by its significantly higher success rate in the highway exit task.

In particular, the framework trained in a sample-efficient manner relative to previous methods such as Deep Q-Networks (DQNs), achieving comparable or better performance with fewer samples. This efficiency is attributed to the integration of MCTS, which provides additional structure to the learning process, guiding it more effectively than standalone reinforcement learning methods.

Implications and Future Research Directions

The integration of planning and learning has significant implications for the development of autonomous driving systems. By enhancing both the sample efficiency and decision quality, the framework presents a promising avenue for advancing autonomous vehicle safety and performance. It highlights that the combination of model-based planning with learning-based policy improvement can yield robust decision-making frameworks that are adaptable across various driving scenarios.

Future work could address the adaptation of this framework to more complex environments or extend it to integrate vehicle-to-vehicle communication systems. Enhancing the interpretability of decision processes in neural networks and ensuring the explainability of actions in critical situations remains a pertinent concern for wider adoption. Furthermore, while the emphasis here is on tactical decision-making, extending similar frameworks to incorporate operational and strategic decision layers could provide a more holistic autonomous vehicle control system.

In conclusion, the paper represents a meaningful step forward in autonomous driving research, effectively combining theoretical advancements with practical considerations. The flexibility and adaptability of the approach underscore its potential for broader applications within the autonomous driving domain and beyond.

PDF Markdown