CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving (1807.03776v1)

Published 10 Jul 2018 in cs.CV and cs.RO

Abstract: Autonomous urban driving navigation with complex multi-agent dynamics is under-explored due to the difficulty of learning an optimal driving policy. The traditional modular pipeline heavily relies on hand-designed rules and the pre-processing perception system while the supervised learning-based models are limited by the accessibility of extensive human experience. We present a general and principled Controllable Imitative Reinforcement Learning (CIRL) approach which successfully makes the driving agent achieve higher success rates based on only vision inputs in a high-fidelity car simulator. To alleviate the low exploration efficiency for large continuous action space that often prohibits the use of classical RL on challenging real tasks, our CIRL explores over a reasonably constrained action space guided by encoded experiences that imitate human demonstrations, building upon Deep Deterministic Policy Gradient (DDPG). Moreover, we propose to specialize adaptive policies and steering-angle reward designs for different control signals (i.e. follow, straight, turn right, turn left) based on the shared representations to improve the model capability in tackling with diverse cases. Extensive experiments on CARLA driving benchmark demonstrate that CIRL substantially outperforms all previous methods in terms of the percentage of successfully completed episodes on a variety of goal-directed driving tasks. We also show its superior generalization capability in unseen environments. To our knowledge, this is the first successful case of the learned driving policy through reinforcement learning in the high-fidelity simulator, which performs better-than supervised imitation learning.

Authors (4)

Xiaodan Liang (318 papers)
Tairui Wang (2 papers)
Luona Yang (3 papers)
Eric Xing (127 papers)

Citations (253)

View on Semantic Scholar

Summary

Analysis of CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

The advancement of autonomous driving technology necessitates the development of sophisticated models capable of navigating complex urban environments with multiple interacting agents. Traditional modular systems, which rely heavily on hand-crafted rules for perception and decision-making, often fall short in terms of adaptability and generalization. Similarly, while end-to-end supervised learning approaches have demonstrated potential, they are primarily constrained by the breadth of the training datasets, often failing to generalize to novel situations. This paper introduces a novel approach, Controllable Imitative Reinforcement Learning (CIRL), which leverages the strengths of both imitation learning and reinforcement learning to address the challenges inherent in vision-based self-driving.

The proposed CIRL framework integrates a two-stage learning paradigm to enhance exploration efficiency and driving performance. Initially, it employs a controllable imitation learning stage using human-driving data, establishing a preliminary policy by pre-training the model with deterministic actions derived from manual driving demonstrations. This stage incorporates a gating mechanism that empowers the model with controllable actions based on command inputs, allowing for targeted policy learning across diverse tasks like following lanes or executing turns.

The second stage involves refining this policy through Deep Deterministic Policy Gradient (DDPG), a reinforcement learning algorithm known for its applicability to continuous action spaces. By initializing the actor network with parameters learned during the imitative stage, CIRL effectively mitigates the drawbacks of extensive and inefficient exploration synonymous with typical RL approaches. This strategic warm-up via imitation accelerates convergence towards robust policies through effective exploration seeded in a more promising action space. A major innovation within CIRL is its command-based stratification of policy learning, which customizes steering-angle reward mechanisms to better accommodate varied navigation directives, ensuring higher fidelity in policy execution for distinct scenarios.

A comprehensive set of experiments on the CARLA driving benchmark underscores the effectiveness of CIRL. Compared against state-of-the-art frameworks, including modular pipelines and both imitation and reinforcement learning models, CIRL exhibits superior success rates in task completion across various settings, including previously unseen urban environments and adverse weather conditions. The results reveal not only its enhanced performance over baseline methods but also its prowess in generalization – a critical metric for real-world applicability.

The implications of these findings are significant for both real-time autonomous navigation systems and the broader field of AI, showcasing a tangible pathway to bridge the gap between human-imitation and autonomous decision-making. By effectively leveraging pre-learned experiences to prompt efficient RL exploration, CIRL sets the stage for developing more advanced models capable of withstanding the unpredictabilities of urban driving.

Future prospects for research in this domain could explore the integration of more sophisticated perception algorithms to further improve situational awareness. Additionally, investigating the deployment of CIRL within edge-case scenarios and its adaptation to multi-agent coordination tasks could broaden its application scope. Ensuring scalability and robustness in real-world driving contexts will likely necessitate ongoing model refinements and empirical validations.

PDF Markdown

Related Papers

Find Related Papers