Query-Efficient Imitation Learning for End-to-End Autonomous Driving (1605.06450v1)

Published 20 May 2016 in cs.LG, cs.AI, and cs.RO

Abstract: One way to approach end-to-end autonomous driving is to learn a policy function that maps from a sensory input, such as an image frame from a front-facing camera, to a driving action, by imitating an expert driver, or a reference policy. This can be done by supervised learning, where a policy function is tuned to minimize the difference between the predicted and ground-truth actions. A policy function trained in this way however is known to suffer from unexpected behaviours due to the mismatch between the states reachable by the reference policy and trained policy functions. More advanced algorithms for imitation learning, such as DAgger, addresses this issue by iteratively collecting training examples from both reference and trained policies. These algorithms often requires a large number of queries to a reference policy, which is undesirable as the reference policy is often expensive. In this paper, we propose an extension of the DAgger, called SafeDAgger, that is query-efficient and more suitable for end-to-end autonomous driving. We evaluate the proposed SafeDAgger in a car racing simulator and show that it indeed requires less queries to a reference policy. We observe a significant speed up in convergence, which we conjecture to be due to the effect of automated curriculum learning.

Citations (199)

View on Semantic Scholar

Summary

The paper proposes SafeDAgger, a novel imitation learning framework that minimizes expert queries via a predictive safety policy for risk assessment.
It employs a binary classifier to selectively query only in unsafe states, thereby improving training efficiency and convergence.
Empirical evaluation in the TORCS simulator demonstrates that SafeDAgger yields improved lap completions and reduced damage compared to standard methods.

Overview of Query-Efficient Imitation Learning for End-to-End Autonomous Driving

The paper "Query-Efficient Imitation Learning for End-to-End Autonomous Driving" proposes a novel approach to improving imitation learning techniques in the context of autonomous vehicle control. This work introduces SafeDAgger, an extension of the existing DAgger algorithm, addressing its query inefficiency during training phases.

Background and Motivation

Imitation learning is integral to end-to-end autonomous driving as it derives a policy function mapping sensory inputs, such as camera images, to driving actions by imitating expert drivers. Early neural network models like ALVINN and later developments utilizing deeper architectures have shown efficacy, but purely supervised approaches often lead to suboptimal models due to state distribution mismatches. DAgger mitigates this by incorporating examples collected from both expert and learned policies. However, its frequent expert policy queries can be operationally expensive, especially when experts are human drivers.

SafeDAgger Extension

SafeDAgger enhances DAgger by introducing a safety policy to forecast and mitigate potential errors in real-time. This safety policy predicts deviations between expert and primary policies' actions without querying the expert, effectively reducing the need for frequent consultations with the reference policy. The safety policy is trained to identify critical or dangerous states, hence focusing the training efforts on resolving substantial behavioral disparities.

Methodology

SafeDAgger employs a binary safety policy to tag states as either safe or unsafe based on the predicted error threshold. During training, only states flagged as unsafe require querying the reference policy for correction. This selective querying drastically reduces the number of expert consultations required, enhancing efficiency. SafeDAgger involves iterative refinement of the primary policy, concurrently updating the safety policy to reflect potential errors in emerging driving contexts.

Empirical Evaluation

Using the TORCS simulator, SafeDAgger was benchmarked against standard DAgger and purely supervised models. The evaluation measured performance based on successful lap completions and damage incurred without and with traffic. Key findings confirmed that SafeDAgger achieved superior outcomes in terms of efficiency and policy effectiveness, significantly lowering the query count while maintaining or improving driving performance.

Implications and Future Directions

SafeDAgger's methodology optimizes the use of expert input, making it suitable for scenarios where expert interactions are time-consuming or cost-prohibitive. Reducing expert dependency not only increases the feasibility of training complex end-to-end models but also accelerates overall convergence. The introduction of a safety policy may pave the way for more autonomous driving solutions capable of surpassing expert policies by integrating reinforcement learning post SafeDAgger training. Future ventures should explore adaptive threshold tuning for safety policies and examine robustness across varied driving scenarios.

This paper contributes substantively to the field of autonomous vehicle control by providing a query-efficient, scalable approach to imitation learning through innovative intermediaries like safety policies, which could be a vital stepping stone toward more autonomous, reliable driving systems.

PDF Markdown