Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Drive in a Day (1807.00412v2)

Published 1 Jul 2018 in cs.LG, cs.AI, cs.RO, and stat.ML

Abstract: We demonstrate the first application of deep reinforcement learning to autonomous driving. From randomly initialised parameters, our model is able to learn a policy for lane following in a handful of training episodes using a single monocular image as input. We provide a general and easy to obtain reward: the distance travelled by the vehicle without the safety driver taking control. We use a continuous, model-free deep reinforcement learning algorithm, with all exploration and optimisation performed on-vehicle. This demonstrates a new framework for autonomous driving which moves away from reliance on defined logical rules, mapping, and direct supervision. We discuss the challenges and opportunities to scale this approach to a broader range of autonomous driving tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Alex Kendall (23 papers)
  2. Jeffrey Hawke (6 papers)
  3. David Janz (13 papers)
  4. Przemyslaw Mazur (3 papers)
  5. Daniele Reda (10 papers)
  6. John-Mark Allen (1 paper)
  7. Vinh-Dieu Lam (2 papers)
  8. Alex Bewley (30 papers)
  9. Amar Shah (11 papers)
Citations (594)

Summary

Learning to Drive in a Day

The paper "Learning to Drive in a Day" presents a notable application of deep reinforcement learning (RL) applied to autonomous driving, offering an alternative to traditional rule-based systems and imitation learning. The authors posit that RL's adaptability and corrective capabilities make it suitable for driving tasks where the state-action space is complex and dynamic.

Key Contributions

  1. Autonomous Driving as an MDP: The authors frame driving as a Markov Decision Process (MDP), emphasizing minimal reliance on external inputs beyond monocular imagery and basic vehicle telemetry. The reward structure, focusing on distance traveled without human intervention, is notably sparse yet functional for the task at hand.
  2. Application of DDPG: The research employs Deep Deterministic Policy Gradients (DDPG) to handle continuous action spaces, vital for the nuanced control needed in driving. The learning occurs directly on the vehicle, eliminating dependency on simulated environments for policy optimization.
  3. Real-World Experimentation: The paper's implementation on an actual vehicle—a modified Renault Twizy—demonstrates the RL framework's viability. The on-vehicle computation and episodic learning loop allow real-time policy adjustments.

Methodology

The state space is defined by images and vehicle dynamics, processed by a shared convolutional network for the actor and critic in the DDPG framework. This structure ensures efficient learning despite the expansive input space. An innovative task-based workflow facilitates driver intervention and system resets, optimizing learning under real-world constraints.

Results and Implications

The experiments reveal that RL can be used to achieve lane following within minimal episodes, confirming RL's potential as a scalable solution for autonomous driving. The modular design, relying solely on onboard computation, sets a stage for RL as an alternative to map-dependent driving systems.

Strong Numerical Results:

  • The VAE-enhanced model showed remarkable efficiency, achieving successful lane following in just 11 episodes, contrasting with 35 for the pixel-based approach.

Challenges and Opportunities

While the approach demonstrated efficacy for lane following, expanding to complex urban scenarios remains a significant challenge. Future work could investigate enhanced reward functions and advanced state representations incorporating semantic segmentation and predictive models.

Speculation on Future Developments:

The combination of reinforcement learning with semi-supervised learning and domain transfer could significantly reduce the data and training required for comprehensive autonomous driving solutions. Additionally, advances in model-based RL approaches could provide more data-efficient training by learning state transitions more effectively.

In conclusion, this paper provides a foundational framework for leveraging RL in autonomous driving, identifying both the potential and the obstacles in scaling to full-spectrum driving tasks. This work should motivate further exploration of RL within the autonomous systems community, blending techniques from various domains to overcome current limitations.

Youtube Logo Streamline Icon: https://streamlinehq.com