MGTR: Multi-Granular Transformer for Motion Prediction with LiDAR (2312.02409v2)

Published 5 Dec 2023 in cs.CV and cs.RO

Abstract: Motion prediction has been an essential component of autonomous driving systems since it handles highly uncertain and complex scenarios involving moving agents of different types. In this paper, we propose a Multi-Granular TRansformer (MGTR) framework, an encoder-decoder network that exploits context features in different granularities for different kinds of traffic agents. To further enhance MGTR's capabilities, we leverage LiDAR point cloud data by incorporating LiDAR semantic features from an off-the-shelf LiDAR feature extractor. We evaluate MGTR on Waymo Open Dataset motion prediction benchmark and show that the proposed method achieved state-of-the-art performance, ranking 1st on its leaderboard (https://waymo.com/open/challenges/2023/motion-prediction/).

Citations (11)

View on Semantic Scholar

Summary

The paper introduces MGTR, a transformer-based model that leverages multi-granular LiDAR and map features to enhance trajectory prediction.
It refines context understanding with motion-aware search and an encoder-decoder architecture that tailors predictions at various detail levels.
MGTR outperforms current benchmarks on the Waymo dataset, notably reducing trajectory miss rates for pedestrians and cyclists.

Introduction to Motion Prediction Models

Motion prediction is an integral part of autonomous driving technology. It allows vehicles to plan a safe path forward by anticipating the actions of surrounding entities like other vehicles, cyclists, and pedestrians. The accuracy of motion prediction directly impacts the safety and reliability of autonomous vehicles. Traditional methods often process sensory data into images or vectors and use convolutional neural networks (CNNs) or graph-structured models to make predictions. However, these methods have limitations, including difficulties with representing diverse context information and managing computational complexity.

Advancements with Multi-Granular TRansformer (MGTR)

The research introduces the Multi-Granular TRansformer (MGTR) framework, which advances the motion prediction field using an encoder-decoder network that manipulates context features at different granularities. This model leverages LiDAR data to include detailed semantic information from the surrounding environment. By assessing the MGTR on the Waymo Open Dataset motion prediction benchmark, the MGTR demonstrates exceptional performance, outperforming competitors on the leaderboard.

Core Concepts of MGTR

Multi-Granular Input Representation

MGTR has the ability to process inputs at various levels of detail, including both map elements and LiDAR data. For instance, map features such as road layouts are encoded at different granularities based on sampling rates, while LiDAR data is processed through voxel features and transformed into tokens that encapsulate environment context. By utilizing these multi-granular tokens, MGTR can tailor the level of detail to the specific motion patterns and prediction needs of each agent in the scene.

Motion-Aware Context Search

MGTR introduces motion-aware context search, which significantly improves training efficiency and prediction accuracy. Instead of treating all agents equally, this mechanism considers the velocity and movement patterns of individual agents to collect relevant scene context. By doing so, MGTR can effectively refine context features and make more precise predictions.

Transformer Encoder and Decoder

MGTR's encoder and decoder structure leverage local attention mechanisms to aggregate features effectively, ensuring a refined prediction model. The encoder takes into account not only the history but also potential future trajectories of agents, while the decoder uses these refined tokens and a set of intention goals to make multimodal trajectory predictions. Furthermore, the model employs Gaussian Mixture Model (GMM) to represent multiple potential futures for any given agent, adding another layer to its prediction capabilities.

Benchmarks and Results

In rigorous testing against the Waymo Open Dataset, MGTR achieved state-of-the-art performance, particularly in predicting movements for pedestrians and cyclists. The research emphasizes that these improvements are largely due to the multi-granularity approach that captures the nuanced movements of non-vehicle agents. Detailed results indicate that MGTR reduced trajectory miss rates and improved prediction accuracy across various evaluation metrics.

Conclusions

MGTR ushers in a new wave of efficiency and accuracy for motion prediction in autonomous driving. By incorporating 3D context information from LiDAR within a Transformer-based model, and leveraging multi-granularity for inputs, this model demonstrates superiority in understanding and reacting to complex driving environments. The results of this research not only serve as a benchmark for future motion prediction models but also bring autonomous driving technology one step closer to widespread and safe adoption.

PDF Markdown

Related Papers

YouTube

Show All Videos