Interpretable End-to-end Urban Autonomous Driving with Latent Deep Reinforcement Learning (2001.08726v3)

Published 23 Jan 2020 in cs.RO, cs.CV, and cs.LG

Abstract: Unlike popular modularized framework, end-to-end autonomous driving seeks to solve the perception, decision and control problems in an integrated way, which can be more adapting to new scenarios and easier to generalize at scale. However, existing end-to-end approaches are often lack of interpretability, and can only deal with simple driving tasks like lane keeping. In this paper, we propose an interpretable deep reinforcement learning method for end-to-end autonomous driving, which is able to handle complex urban scenarios. A sequential latent environment model is introduced and learned jointly with the reinforcement learning process. With this latent model, a semantic birdeye mask can be generated, which is enforced to connect with a certain intermediate property in today's modularized framework for the purpose of explaining the behaviors of learned policy. The latent space also significantly reduces the sample complexity of reinforcement learning. Comparison tests with a simulated autonomous car in CARLA show that the performance of our method in urban scenarios with crowded surrounding vehicles dominates many baselines including DQN, DDPG, TD3 and SAC. Moreover, through masked outputs, the learned policy is able to provide a better explanation of how the car reasons about the driving environment. The codes and videos of this work are available at our github repo and project website.

PDF Abstract

Interpretable End-to-end Urban Autonomous Driving with Latent Deep Reinforcement Learning

The paper presents an innovative framework for urban autonomous driving using an interpretable end-to-end approach grounded in latent deep reinforcement learning (RL). The authors, Jianyu Chen, Shengbo Eben Li, and Masayoshi Tomizuka, address the limitations inherent in traditional modularized autonomous driving systems by offering a method that integrates perception, decision-making, and control into a single learning task. This integration is pivotal for handling complex urban scenarios, which modular approaches often find challenging.

Methodology

The proposed system employs a sequential latent environment model coupled with maximum entropy RL to manage the intricate dynamics of urban driving environments. The model encodes high-dimensional raw observations, such as visual inputs, spatial features, road conditions, and the states of road users into a low-dimensional latent space. This compressed representation not only simplifies the RL sample complexity but also supports the generation of semantic bird's-eye-view masks that provide interpretability to the driving policy.

The latent space is enforced to explain the policies by associating decoded semantic masks with interpretable intermediate properties. This is a significant enhancement as it provides insights into how the policy reasons about the environment, a feature lacking in many end-to-end systems, which often operate as black-box solutions.

Experimental Evaluation

The authors implemented their method in the CARLA simulator, a high-resolution platform widely used for evaluating autonomous driving systems. The proposed method significantly outperformed several state-of-the-art baselines, including DQN, DDPG, TD3, and SAC, particularly in crowded urban scenarios. Notably, the system excelled in interpreting complex driving situations and providing actionable insights through the semantic masks.

Quantitative evaluation shows an average pixel difference for decoded masks at merely 0.032, indicating high fidelity in capturing semantic understanding of the environment. Moreover, the method's capability to visualize failures, such as collisions, underscores its utility in diagnostics and safety assessments.

Implications and Future Directions

The implications of this work are substantial for both theoretical advancements and practical deployments in autonomous vehicle systems. By demonstrating the integration of interpretability into end-to-end learning frameworks without sacrificing performance, the authors pave the way for more transparent and understandable AI-driven control systems. This is especially pertinent in safety-critical applications such as urban autonomous driving, where understanding the decision-making process is imperative for developing trust and accountability.

Looking forward, the paper suggests exploring model-based approaches to bolster interpretability and performance, a natural extension that could leverage additional physics-based or rule-based reasoning layers within the learned model. The fusion of interpretability, planning, and robust control in an end-to-end trainable framework sets a promising research trajectory in the field of intelligent transportation systems and beyond.

In conclusion, this paper contributes a robust methodological framework and reports compelling results for the future development and deployment of interpretable autonomous driving systems in urban environments. This work not only enhances the technical sophistication of autonomous systems but also contributes to the growing body of work on making AI systems more explainable and trustworthy.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Jianyu Chen (69 papers)
Shengbo Eben Li (98 papers)
Masayoshi Tomizuka (261 papers)

Citations (198)

View on Semantic Scholar

Interpretable End-to-end Urban Autonomous Driving with Latent Deep Reinforcement Learning (2001.08726v3)