End-to-end Interpretable Neural Motion Planner (2101.06679v1)

Published 17 Jan 2021 in cs.CV and cs.RO

Abstract: In this paper, we propose a neural motion planner (NMP) for learning to drive autonomously in complex urban scenarios that include traffic-light handling, yielding, and interactions with multiple road-users. Towards this goal, we design a holistic model that takes as input raw LIDAR data and a HD map and produces interpretable intermediate representations in the form of 3D detections and their future trajectories, as well as a cost volume defining the goodness of each position that the self-driving car can take within the planning horizon. We then sample a set of diverse physically possible trajectories and choose the one with the minimum learned cost. Importantly, our cost volume is able to naturally capture multi-modality. We demonstrate the effectiveness of our approach in real-world driving data captured in several cities in North America. Our experiments show that the learned cost volume can generate safer planning than all the baselines.

Citations (369)

View on Semantic Scholar

Summary

The paper proposes an end-to-end neural framework integrating perception and planning for autonomous driving, using LiDAR/HD maps and a dynamic cost volume to evaluate trajectories.
Experimental evaluation on real-world data shows the system reduces collision rates and traffic violations compared to baseline methods in complex urban scenarios.
The research demonstrates the feasibility of integrating perception and planning end-to-end for autonomous driving, offering a step towards more robust neural systems.

Analysis of "End-to-end Interpretable Neural Motion Planner"

The paper presents an advanced neural motion planner (NMP) designed for autonomous driving in complex urban environments. It integrates robust perceptual and planning capabilities within a single end-to-end framework, making significant strides in producing safe and reliable self-driving systems.

Framework and Model Architecture

The authors propose a holistic approach which contrasts with traditional self-driving systems divided into separate modules, namely perception, prediction, motion planning, and control. Instead, this paper introduces a unified architecture that ingests raw LiDAR data and HD maps, outputting both interpretable intermediate representations and actionable planning directives. The approach rests on creating a cost volume that evaluates the feasibility and 'goodness' of potential trajectories over a specified planning horizon, thereby capturing multi-modal uncertainties inherent in urban driving.

Methodological Contributions

The framework addresses well-known issues in end-to-end driving such as interpretability and error compounding, while also tackling modular systems' pitfalls of sub-optimal systemic integration. Notably, the method uses a deep convolutional backbone to produce 3D detections and motion forecasts, which are then utilized to construct dynamic cost volumes. These volumes effectively encode the spatial and temporal feasibility of different trajectories, allowing real-time, data-driven decision making.

Input Representation:
- Employing LiDAR and HD maps, the input is rasterized into 3D tensors, capturing spatial-temporal features for downstream processing.
Network Architecture:
- The backbone network extracts multi-scale features rich in contextual information, critical for accurate motion planning.
- Separate headers for perception and cost volume generation ensure that both object detection and trajectory evaluation are optimized simultaneously.
Trajectory Sampling and Cost Minimization:
- The system samples physically plausible trajectories based on vehicle dynamics and selects the one minimizing the learned cost metric, harnessing a structured optimization framework.
Training Paradigm:
- A multi-task loss supports the learning of a complex mapping from sensor inputs to actionable planning, encompassing both perception and trajectory optimization.

Experimental Evaluation

The authors extensively validate their approach using a real-world dataset collected from multiple cities across North America. The experimental results illustrate the model's superiority over baseline methods, emphasizing notable reductions in collision rates and traffic rule violations. For detection and motion forecasting, the NMP demonstrates performance on par or better than models specifically designed for those tasks alone. The learned cost volume, crucially, enables the system to successfully navigate rich urban scenarios while maintaining interpretability.

Implications and Future Directions

This research underscores the viability of integrating perception and planning into a single neural framework, pushing the boundaries of what end-to-end systems can achieve. Practically, the work suggests that deep learning approaches can learn complex, safety-critical tasks without relying exclusively on manually engineered components or labels for intermediate states.

Looking forward, advancing the interpretability of these systems remains pivotal, particularly in gaining regulatory and public trust. Further research might explore integrating additional sensor modalities or leveraging unsupervised and semi-supervised learning to better generalize across diverse environments. Enabling smoother transitions between multimodal trajectories and capturing nuanced driver intentions might unlock even more robust self-driving capabilities.

In conclusion, the paper presents a significant step in evolving autonomous driving systems. By creating a well-rounded, interpretable framework, it establishes a platform from which further innovations in end-to-end autonomous system design can be pioneered.

PDF Markdown

Related Papers

YouTube

Show All Videos