Hierarchical Generative Adversarial Imitation Learning with Mid-level Input Generation for Autonomous Driving on Urban Environments

Published 9 Feb 2023 in cs.LG, cs.AI, and cs.RO | (2302.04823v5)

Abstract: Deriving robust control policies for realistic urban navigation scenarios is not a trivial task. In an end-to-end approach, these policies must map high-dimensional images from the vehicle's cameras to low-level actions such as steering and throttle. While pure Reinforcement Learning (RL) approaches are based exclusively on engineered rewards, Generative Adversarial Imitation Learning (GAIL) agents learn from expert demonstrations while interacting with the environment, which favors GAIL on tasks for which a reward signal is difficult to derive, such as autonomous driving. However, training deep networks directly from raw images on RL tasks is known to be unstable and troublesome. To deal with that, this work proposes a hierarchical GAIL-based architecture (hGAIL) which decouples representation learning from the driving task to solve the autonomous navigation of a vehicle. The proposed architecture consists of two modules: a GAN (Generative Adversarial Net) which generates an abstract mid-level input representation, which is the Bird's-Eye View (BEV) from the surroundings of the vehicle; and the GAIL which learns to control the vehicle based on the BEV predictions from the GAN as input. hGAIL is able to learn both the policy and the mid-level representation simultaneously as the agent interacts with the environment. Our experiments made in the CARLA simulation environment have shown that GAIL exclusively from cameras (without BEV) fails to even learn the task, while hGAIL, after training exclusively on one city, was able to autonomously navigate successfully in 98% of the intersections of a new city not used in training phase. Videos and code available at: https://sites.google.com/view/hgail

Abstract PDF HTML Upgrade to Chat

References (40)

Citations (3)

View on Semantic Scholar

Summary

The paper introduces a hierarchical GAIL framework that leverages GAN-generated BEV inputs to overcome instability in deep RL from raw images.
The paper demonstrates a 98% success rate in intersection navigation within urban environments using the CARLA simulation platform.
The paper decouples mid-level representation learning from policy execution, enhancing generalization to new cityscapes and dynamic urban scenarios.

Hierarchical Generative Adversarial Imitation Learning for Autonomous Driving

The paper discusses an innovative approach to developing robust control policies for autonomous vehicle navigation in urban environments. The authors propose a Hierarchical Generative Adversarial Imitation Learning (hGAIL) architecture designed to enhance the stability and efficacy of policy learning by integrating a Generative Adversarial Network (GAN) to produce an abstract mid-level input representation known as a Bird's-Eye View (BEV).

Core Contributions

The proposed hGAIL framework tackles the inherent complexities in training deep networks directly from high-dimensional camera images for Reinforcement Learning (RL) tasks, a process known for its instability. The approach consists of two main components:

Mid-level Input Generation: A GAN operates as the first module, creating an abstract BEV representation from raw camera inputs. This addresses the instability typically associated with training RL agents directly from raw image data.
Policy Learning: The second module, based on Generative Adversarial Imitation Learning (GAIL), uses the generated BEV representation to learn effective driving strategies. GAIL facilitates policy learning by leveraging expert demonstrations, alleviating the difficulty of defining an explicit reward function.

Empirical Evaluation

The authors conducted experiments using the CARLA simulation environment, assessing hGAIL's performance against baseline approaches. The study demonstrated that learning policies solely from high-dimensional camera data without mid-level abstractions led to unsuccessful training outcomes. In contrast, the hGAIL agent achieved a high success rate, effectively navigating new cityscapes with a 98% success rate in intersection navigation, despite being trained in a different city. This highlights the utility of GAN-produced mid-level representations for stable and efficient policy learning.

Implications and Future Directions

This work contributes to the field by advancing the design of autonomous navigation systems with more stable and reliable training frameworks. The separation of representation learning from policy training allows the system to better generalize and adapt to novel environments. The integration of GANs in generating mid-level input representations suggests possibilities for enhancing real-world applications where direct environment mapping is infeasible.

Theoretically, the paper enriches the discourse on imitation learning frameworks for AD, providing an avenue for further exploration into the effective delineation of input processing and policy execution phases. Practically, this architecture can be extended to more dynamic scenarios involving other traffic participants and environmental conditions, enhancing the real-world robustness of autonomous vehicle systems.

In future developments, the approach to learning mid-level representations like BEV could facilitate sim-to-real transfer, thereby transitioning learned policies from simulated to real environments. This leverages the potential of hierarchical learning structures for efficient, scalable real-world deployment of autonomous driving technologies. The exploration of advanced scenario integration, such as dynamic obstacles, traffic signals, and diverse weather conditions, remains a promising and crucial endeavor for further research.

Markdown