End-to-End Urban Driving by Imitating a Reinforcement Learning Coach (2108.08265v3)

Published 18 Aug 2021 in cs.CV and cs.RO

Abstract: End-to-end approaches to autonomous driving commonly rely on expert demonstrations. Although humans are good drivers, they are not good coaches for end-to-end algorithms that demand dense on-policy supervision. On the contrary, automated experts that leverage privileged information can efficiently generate large scale on-policy and off-policy demonstrations. However, existing automated experts for urban driving make heavy use of hand-crafted rules and perform suboptimally even on driving simulators, where ground-truth information is available. To address these issues, we train a reinforcement learning expert that maps bird's-eye view images to continuous low-level actions. While setting a new performance upper-bound on CARLA, our expert is also a better coach that provides informative supervision signals for imitation learning agents to learn from. Supervised by our reinforcement learning coach, a baseline end-to-end agent with monocular camera-input achieves expert-level performance. Our end-to-end agent achieves a 78% success rate while generalizing to a new town and new weather on the NoCrash-dense benchmark and state-of-the-art performance on the challenging public routes of the CARLA LeaderBoard.

Authors (5)

Zhejun Zhang (11 papers)
Alexander Liniger (42 papers)
Dengxin Dai (99 papers)
Fisher Yu (104 papers)
Luc Van Gool (570 papers)

Citations (169)

View on Semantic Scholar

Summary

Overview of "End-to-End Urban Driving by Imitating a Reinforcement Learning Coach"

This paper presents a novel approach to autonomous urban driving by leveraging a reinforcement learning (RL) expert as a coach to train imitation learning (IL) agents. The paper targets the shortcomings of existing methods that rely heavily on expert demonstrations, often suffering due to covariate shift when these demonstrations are based on human drivers or rule-based automated systems. This research proposes a more efficient alternative using a reinforcement learning-based coach that provides richer, on-policy data for training IL agents effectively.

Methodology and Architecture

The central innovation in this work is the development of "Roach," a reinforcement learning coach that achieves high performance on the CARLA simulator. Roach processes bird's-eye view images and maps them to continuous low-level actions. The paper outlines several advancements in the training of Roach:

Beta Distribution Usage: Unlike Gaussian distributions, the beta distribution used for action representation ensures that the output is bounded, improving the behavior of the learning problem.
Exploration Loss: Incorporating an exploration loss that augments entropy loss, encouraging exploration while conforming to desired driving behaviors. It refines learning by considering crucial deviations that lead to infractions, thus promoting more informed exploration.

The architecture of Roach leverages a convolutional neural network to process bird's-eye view images from the CARLA simulator, followed by fully connected layers that produce both the value functions and policy distributions.

IL Agent Enhancement via RL Expertise

The paper also investigates how imitation learning agents can benefit from diverse supervisions provided by Roach:

Soft Targets: The agent learns not only from deterministic expert actions but also from distributions, offering richer and more informative learning signals.
Feature Matching: IL agents are guided to match internal latent features with those from Roach, helping to encode the underlying driving behaviors more effectively and improve generalization, especially to unseen environments or weather conditions.
Value Estimations: By adding a value head to the IL agent architecture, the paper posits that encoding future return predictions further aids learning robust driving policies.

Results and Analysis

The empirical results demonstrate that Roach sets a new performance upper bound on the CARLA benchmark, surpassing the existing automated autopilots. The IL agents trained under the guidance of Roach achieve substantial improvements over previous state-of-the-art models. In the NoCrash-dense benchmark, the IL agent supervised by Roach maintains a high success rate even under new towns and weather conditions, indicating strong generalization capabilities.

The implications of this research are twofold:

Practically, it indicates a pathway to create more efficient and generalizable autonomous driving systems.
Theoretically, it promotes a reevaluation of how IL and RL can synergistically interact to address data-hungry challenges in autonomous driving.

Future Directions

The paper points towards potential enhancements in agent architecture and proposes further exploration into refining reward structures and representation methods (e.g., closing the sim-to-real gap more effectively). Moreover, deploying such a model in real-world scenarios could significantly benefit from advances in sensor data synthesis, potentially addressing current bottlenecks in simulating real-world complexity.

In conclusion, this research provides an important contribution to the field by illustrating a sophisticated method to leverage reinforcement learning in improving the training and performance of end-to-end autonomous driving systems. Its approach of using a reinforcement learning coach could extend beyond urban driving, offering insight into complex decision-making problems faced by autonomous systems.

Related Papers

Tweets

https://twitter.com/PDillis/status/1879648904662393025