End-to-End Race Driving with Deep Reinforcement Learning (1807.02371v2)

Published 6 Jul 2018 in cs.CV and cs.RO

Abstract: We present research using the latest reinforcement learning algorithm for end-to-end driving without any mediated perception (object recognition, scene understanding). The newly proposed reward and learning strategies lead together to faster convergence and more robust driving using only RGB image from a forward facing camera. An Asynchronous Actor Critic (A3C) framework is used to learn the car control in a physically and graphically realistic rally game, with the agents evolving simultaneously on tracks with a variety of road structures (turns, hills), graphics (seasons, location) and physics (road adherence). A thorough evaluation is conducted and generalization is proven on unseen tracks and using legal speed limits. Open loop tests on real sequences of images show some domain adaption capability of our method.

Citations (164)

View on Semantic Scholar

Summary

The paper presents an end-to-end deep reinforcement learning model using A3C for race car control directly from visual input in simulation.
The model achieves robust driving at an average speed of 73 km/h in simulation, showing effective handling and promising generalization to unseen tracks.
The end-to-end approach demonstrates potential for advancing autonomous navigation in complex scenarios and bridging simulated training with real-world applications.

End-to-End Race Driving with Deep Reinforcement Learning: An Overview

The paper "End-to-End Race Driving with Deep Reinforcement Learning" presents an advancement in the field of autonomous driving by introducing an end-to-end model that uses deep reinforcement learning (DRL) to control a racing car in a simulated environment. This research focuses on utilizing the Asynchronous Actor Critic (A3C) framework to learn car control directly from RGB image inputs, without the need for intermediate perceptual tasks like scene understanding or object recognition. The work is tested within the World Rally Championship 6 (WRC6) racing game, which provides a complex environment with varying road structures and physical conditions.

Methodology

The proposed method diverges from traditional perception-planning-control paradigms, opting to train a convolutional neural network (CNN) combined with a recurrent neural network (RNN) in an end-to-end manner. The algorithm employs reinforcement learning strategies to optimize control policies based on a reward mechanism. This reward is designed to encourage fast and efficient driving while penalizing deviation from the road center and angle misalignment between the car and road.

Key components of the approach include:

Control Strategy: The model outputs a set of 32 discrete control commands covering lateral (steering) and longitudinal (acceleration/braking) actions. Notably, this includes the use of handbrakes to enable drift maneuvers.
Reward Shaping: The reward function significantly impacts learning efficiency and driving behavior. The paper evaluates multiple reward strategies, demonstrating that incorporating distance from the road center enhances the system's learning efficiency and reduces collisions.
Agent Initialization: To better mimic real-world driving scenarios and enhance generalization, the agents are initialized at random checkpoints throughout the training tracks, rather than always at the start, fostering exploration and reducing overfitting.

Results and Discussion

The paper details rigorous evaluation carried out on multiple training tracks with varying conditions. It reveals that the proposed model exhibits robust driving capabilities, achieving an average speed of 73 km/h while effectively negotiating turns and bends with a reduced crash rate. While faster exploration and learning were facilitated by the asynchronous learning framework, the agents showed reduced performance at high-speed racing on challenging tracks such as those with snowy conditions due to the increased complexity in slip dynamics.

Interestingly, the research also demonstrates promising generalization capabilities on unseen tracks and real video inputs, suggesting that models trained in simulated environments can potentially adapt to real-world conditions. Although the trained model optimized for racing does not necessarily conform to real-world driving limitations like speed limits, tests showed improved performance when constraining speeds as per road curvature and infrastructure norms.

Implications and Future Directions

The implications of this paper are notable for both gaming AI and potentially broader autonomous vehicular control systems. By advancing end-to-end control architectures, the paper contributes to a foundational understanding that may accelerate development in areas requiring autonomous navigation in complex scenarios.

Future research may explore refining reward functions to balance speed with safety further and investigate hybrid models combining end-to-end learning with modular perception-based systems. Additionally, expanding training into more diversified environments with real-time adjustments and maintaining safety protocols will be vital for real-world applications. As end-to-end models mature, the blending of simulated training with real-world data collection could enhance transfer learning effectiveness, bridging the gap to practical AI deployment in autonomous systems.