Learning to Drive in a Day

Published 1 Jul 2018 in cs.LG, cs.AI, cs.RO, and stat.ML | (1807.00412v2)

Abstract: We demonstrate the first application of deep reinforcement learning to autonomous driving. From randomly initialised parameters, our model is able to learn a policy for lane following in a handful of training episodes using a single monocular image as input. We provide a general and easy to obtain reward: the distance travelled by the vehicle without the safety driver taking control. We use a continuous, model-free deep reinforcement learning algorithm, with all exploration and optimisation performed on-vehicle. This demonstrates a new framework for autonomous driving which moves away from reliance on defined logical rules, mapping, and direct supervision. We discuss the challenges and opportunities to scale this approach to a broader range of autonomous driving tasks.

Abstract PDF Upgrade to Chat

Citations (594)

View on Semantic Scholar

Summary

The paper frames autonomous driving as an MDP using sparse rewards and achieves lane following in just 11 episodes.
The authors implement Deep Deterministic Policy Gradients directly on a real vehicle without relying on simulators.
The study highlights RL’s potential for scalable autonomous driving while identifying challenges for urban driving expansion.

Learning to Drive in a Day

The paper "Learning to Drive in a Day" presents a notable application of deep reinforcement learning (RL) applied to autonomous driving, offering an alternative to traditional rule-based systems and imitation learning. The authors posit that RL's adaptability and corrective capabilities make it suitable for driving tasks where the state-action space is complex and dynamic.

Key Contributions

Autonomous Driving as an MDP: The authors frame driving as a Markov Decision Process (MDP), emphasizing minimal reliance on external inputs beyond monocular imagery and basic vehicle telemetry. The reward structure, focusing on distance traveled without human intervention, is notably sparse yet functional for the task at hand.
Application of DDPG: The research employs Deep Deterministic Policy Gradients (DDPG) to handle continuous action spaces, vital for the nuanced control needed in driving. The learning occurs directly on the vehicle, eliminating dependency on simulated environments for policy optimization.
Real-World Experimentation: The study's implementation on an actual vehicle—a modified Renault Twizy—demonstrates the RL framework's viability. The on-vehicle computation and episodic learning loop allow real-time policy adjustments.

Methodology

The state space is defined by images and vehicle dynamics, processed by a shared convolutional network for the actor and critic in the DDPG framework. This structure ensures efficient learning despite the expansive input space. An innovative task-based workflow facilitates driver intervention and system resets, optimizing learning under real-world constraints.

Results and Implications

The experiments reveal that RL can be used to achieve lane following within minimal episodes, confirming RL's potential as a scalable solution for autonomous driving. The modular design, relying solely on onboard computation, sets a stage for RL as an alternative to map-dependent driving systems.

Strong Numerical Results:

The VAE-enhanced model showed remarkable efficiency, achieving successful lane following in just 11 episodes, contrasting with 35 for the pixel-based approach.

Challenges and Opportunities

While the approach demonstrated efficacy for lane following, expanding to complex urban scenarios remains a significant challenge. Future work could investigate enhanced reward functions and advanced state representations incorporating semantic segmentation and predictive models.

Speculation on Future Developments:

The combination of reinforcement learning with semi-supervised learning and domain transfer could significantly reduce the data and training required for comprehensive autonomous driving solutions. Additionally, advances in model-based RL approaches could provide more data-efficient training by learning state transitions more effectively.

In conclusion, this paper provides a foundational framework for leveraging RL in autonomous driving, identifying both the potential and the obstacles in scaling to full-spectrum driving tasks. This work should motivate further exploration of RL within the autonomous systems community, blending techniques from various domains to overcome current limitations.

Markdown