Learning by Cheating (1912.12294v1)

Published 27 Dec 2019 in cs.RO, cs.AI, cs.CV, and cs.LG

Abstract: Vision-based urban driving is hard. The autonomous system needs to learn to perceive the world and act in it. We show that this challenging learning problem can be simplified by decomposing it into two stages. We first train an agent that has access to privileged information. This privileged agent cheats by observing the ground-truth layout of the environment and the positions of all traffic participants. In the second stage, the privileged agent acts as a teacher that trains a purely vision-based sensorimotor agent. The resulting sensorimotor agent does not have access to any privileged information and does not cheat. This two-stage training procedure is counter-intuitive at first, but has a number of important advantages that we analyze and empirically demonstrate. We use the presented approach to train a vision-based autonomous driving system that substantially outperforms the state of the art on the CARLA benchmark and the recent NoCrash benchmark. Our approach achieves, for the first time, 100% success rate on all tasks in the original CARLA benchmark, sets a new record on the NoCrash benchmark, and reduces the frequency of infractions by an order of magnitude compared to the prior state of the art. For the video that summarizes this work, see https://youtu.be/u9ZCxxD-UUw

Citations (471)

View on Semantic Scholar

Summary

The paper introduces a two-stage learning approach that uses a privileged agent to streamline action learning from precise environmental data.
The methodology decouples perception and action tasks, enabling efficient training of a vision-only sensorimotor agent with dynamic supervision.
Empirical evaluation on CARLA and NoCrash benchmarks demonstrates significant gains, including 100% success and markedly reduced driving infractions.

An Analysis of "Learning by Cheating"

The paper "Learning by Cheating" presents an intriguing two-stage approach to enhance the effectiveness of imitation learning for vision-based autonomous urban driving. The authors propose an innovative methodology that simplifies the complex task of training autonomous systems by leveraging privileged information during an initial training phase and subsequently transferring this learned knowledge to a vision-based agent.

Methodological Overview

The methodology is outlined in two distinct stages:

Privileged Agent Training: The first stage involves training an agent with access to ground-truth environmental data, termed the "privileged agent." This agent can observe exact traffic layouts and participant positions, which allows it to "cheat" during learning. It focuses solely on acting rather than perceiving.
Sensorimotor Agent Training: In the second stage, the privileged agent serves as a teacher to a sensorimotor agent, which relies exclusively on visual inputs. The privileged agent's learned policy is used to train the vision-only agent, free from any privileged access.

Technical Insights

Decomposition Rationale

The authors argue that direct sensorimotor learning mixes two formidable tasks—perception and action. By addressing these tasks separately, the learning process is significantly optimized. Initially, the privileged agent solves the action-learning component, uninfluenced by perception complexities. Subsequently, the visual agent focuses on correlating visual input to the actions provided by the privileged teacher agent.

Advantages of the Proposed Approach

The paper highlights three primary advantages of this two-stage process:

Compact Intermediate Representation: The privileged agent operates with a simplified, abstract environmental view, accelerating learning and fostering generalization.
Enhanced Supervision: This agent can function as a dynamic oracle, providing on-policy guidance, which is a more potent supervision method than traditionally passive expert trajectory replication.
White Box Model: The privileged agent is a transparent model that can generate comprehensive actionable insights for any navigation command. This allows all branches of the sensorimotor agent’s decision-making network to be trained in parallel.

Empirical Results

The empirical evaluation conducted on the CARLA and NoCrash benchmarks demonstrates the superiority of this approach. Remarkably, the vision-based agent achieves a 100% success rate on the CARLA benchmark and sets a new performance standard on the NoCrash benchmark by reducing infractions by an order of magnitude compared to previous methods. These strong numerical results underscore the paper's contribution to advancing autonomous driving systems.

Implications and Future Directions

This research presents notable implications for both theory and practice. Theoretically, it underscores the potential of structured decomposition in complex learning tasks. Practically, it proposes a viable pathway for building robust vision-based driving systems. Future work could explore integrating this approach with reinforcement learning to surpass the capabilities of the initial expert. Furthermore, effective sim-to-real transfer mechanisms could allow tangible deployments in real-world scenarios.

In conclusion, "Learning by Cheating" offers a structured way to tackle the challenges inherent in training autonomous vehicles to operate in dynamic urban environments. The strategic separation of the perception and action learning tasks presents a promising direction for the development of more sophisticated autonomous systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/BlackHackOfDoom/status/1820760317074239834

YouTube

Show All Videos