Model-Based Imitation Learning for Urban Driving (2210.07729v2)

Published 14 Oct 2022 in cs.CV, cs.AI, and cs.RO

Abstract: An accurate model of the environment and the dynamic agents acting in it offers great potential for improving motion planning. We present MILE: a Model-based Imitation LEarning approach to jointly learn a model of the world and a policy for autonomous driving. Our method leverages 3D geometry as an inductive bias and learns a highly compact latent space directly from high-resolution videos of expert demonstrations. Our model is trained on an offline corpus of urban driving data, without any online interaction with the environment. MILE improves upon prior state-of-the-art by 31% in driving score on the CARLA simulator when deployed in a completely new town and new weather conditions. Our model can predict diverse and plausible states and actions, that can be interpretably decoded to bird's-eye view semantic segmentation. Further, we demonstrate that it can execute complex driving manoeuvres from plans entirely predicted in imagination. Our approach is the first camera-only method that models static scene, dynamic scene, and ego-behaviour in an urban driving environment. The code and model weights are available at https://github.com/wayveai/mile.

Citations (108)

View on Semantic Scholar

Summary

The paper introduces MILE, a framework that learns urban driving from offline camera-only data without relying on reward signals.
It integrates 3D geometry within a latent dynamics model using a bird’s-eye-view representation to accurately model static scenes and dynamic interactions.
Empirical results on the CARLA simulator show a 31% improvement in driving score, demonstrating enhanced generalization across new towns and weather conditions.

Model-Based Imitation Learning for Urban Driving: An Expert Overview

The paper "Model-Based Imitation Learning for Urban Driving" introduces a novel approach called MILE (Model-based Imitation LEarning), which leverages a model-based framework to enhance autonomous driving systems. This research situates itself at the converging domains of imitation learning, 3D scene understanding, and world modeling, aiming to tackle the intricate challenges posed by urban driving environments.

Key Contributions

MILE focuses on the synthesis of a world model and a driving policy through high-dimensional visual inputs, utilizing a unique integration of 3D geometry as an inductive bias. Unlike prior methodologies which rely on either reward-based reinforcement frameworks or necessitate extensive online environment interaction, MILE is distinguished by its capability to learn from an offline driving dataset without the need for reward signals.

Among its notable contributions, MILE is the first camera-only model to jointly model static scenes, dynamic interactions, and ego-vehicle behavior within an urban driving context. Operating without LiDAR inputs, it sets a new benchmark on the CARLA simulator, achieving a 31% enhancement in driving score over existing models like LAV and Roach, thereby demonstrating substantial improvements in generalization to novel towns and weather conditions.

Methodological Foundations

The MILE architecture is structured around a latent dynamics model driven by observation and expert action sequences. It innovatively utilizes a bird’s-eye-view (BeV) representation, formed through lifting image features to 3D geometric space, facilitating effective environmental modeling without relying on ground truth rewards.

The inference model employs a probabilistic framework to approximate temporal dynamics in driving scenarios, accommodating the stochastic nature of urban environments. This probabilistic design offers robustness against uncertainties inherent in sensor inputs and real-world conditions.

Empirical Evaluation

The empirical evaluation of MILE on CARLA showcases its state-of-the-art performance with significant improvements in driving metrics. Notably, MILE achieves high scores in route completion and infraction measures, indicating proficient handling of navigation tasks and adherence to traffic regulations.

Additionally, MILE demonstrates the ability to predict diverse and plausible future states and actions, a critical feature for executing complex driving maneuvers in imagination, such as negotiating roundabouts and avoiding obstacles like motorcyclists. This capacity for planning and predictive control marks a pivotal step forward in autonomous driving research.

Implications and Future Directions

The implications of this research extend to both practical applications and theoretical advancements in AI. Practically, the capability to operate using camera-only setups has potential for real-world deployments, reducing reliance on expensive LiDAR systems. Theoretically, MILE contributes to the understanding of model-based learning frameworks in dynamic environments, offering insights into integrating visual inputs and latent space representations for behavior modeling.

Future research may explore inferring driving reward functions from expert data, enhancing the planning capabilities within the world model. Additionally, advancing self-supervised techniques could mitigate the dependency on semantic segmentation labels, unlocking broader applications across various robotic domains.

In conclusion, MILE represents a significant advancement in model-based imitation learning for autonomous driving, showcasing the versatility and efficacy of incorporating 3D geometric priors into learning frameworks. As the field progresses, the methodologies and insights from this work will likely catalyze further innovations in building more intelligent and adaptable autonomous systems.

PDF Markdown

Related Papers

GitHub

GitHub - wayveai/mile: PyTorch code for the paper "Model-Based Imitation Learning for Urban Driving". (324 stars)