VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation (2005.04259v1)

Published 8 May 2020 in cs.CV, cs.LG, and stat.ML

Abstract: Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e.g. pedestrians and vehicles) and road context information (e.g. lanes, traffic lights). This paper introduces VectorNet, a hierarchical graph neural network that first exploits the spatial locality of individual road components represented by vectors and then models the high-order interactions among all components. In contrast to most recent approaches, which render trajectories of moving agents and road context information as bird-eye images and encode them with convolutional neural networks (ConvNets), our approach operates on a vector representation. By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps. To further boost VectorNet's capability in learning context features, we propose a novel auxiliary task to recover the randomly masked out map entities and agent trajectories based on their context. We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset. Our method achieves on par or better performance than the competitive rendering approach on both benchmarks while saving over 70% of the model parameters with an order of magnitude reduction in FLOPs. It also outperforms the state of the art on the Argoverse dataset.

Citations (713)

View on Semantic Scholar

Summary

The paper introduces VectorNet that encodes HD maps and agent trajectories using a hierarchical graph neural network to enhance behavior prediction.
It transforms road geometry and agent motion into vectorized forms, preserving spatial structure while significantly reducing computational load.
Experiments on benchmarks like Argoverse show VectorNet outperforms traditional ConvNets, cutting parameters by over 70% and FLOPs by an order of magnitude.

Analysis of "VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation"

This paper presents VectorNet, an approach leveraging vectorized representations for behavior prediction in multi-agent systems, with a particular focus on autonomous driving contexts. The authors propose a hierarchical graph neural network (GNN) to encode high-definition (HD) maps and agent dynamics more efficiently compared to traditional methods using convolutional neural networks (ConvNets) on rasterized images.

Core Contributions

VectorNet introduces a novel way to handle complex road components and their interactions by representing geometrics of HD maps and agent trajectories as sets of vectors, forming the basis for a graph structure. By moving away from rasterized rendering of maps and employing a graph-based model, the approach reduces the computational load significantly and preserves the structured nature of the input data.

Methodology

Vector Representation: The authors convert map features and trajectories into vector sets, allowing the use of GNNs to model the data. These vectors encapsulate spatial locations and attributes needed for accurate behavior prediction.
Hierarchical Graph Structure: The network architecture has two tiers:
- Polyline Subgraphs: These are constructed to capture local information where vectors from the same polylines are interconnected.
- Global Interaction Graph: A fully connected graph to capture global interactions among polylines, encoded using self-attention mechanisms.
Auxiliary Node Completion Task: Beyond predicting future trajectories, the model trains on an auxiliary task to restore masked node features, improving its capacity to learn context interactions comprehensively.

Results

VectorNet's efficacy is demonstrated on two datasets: an in-house behavior prediction benchmark and the publicly available Argoverse dataset. On both datasets, VectorNet achieves superior or comparable performance over rasterized ConvNet models with significantly reduced computational demand. The model reduces over 70% of parameters and achieves an order of magnitude reduction in FLOPs compared to conventional methods. On the Argoverse dataset, VectorNet also surpasses previous state-of-the-art approaches.

Implications and Future Directions

VectorNet provides an efficient and effective method for encoding map and agent dynamics, showing great potential for real-time applications in autonomous driving systems where computational efficiency is critical. The elimination of ConvNets, along with the proposed vectorized representation, could inspire similar methodologies in other domains involving structured inputs.

Further extensions could explore multi-modal trajectory prediction, incorporating probabilistic trajectories in dynamic environments. Moreover, optimizing the graph construction process for incrementally updating real-time data streams could refine its applicability for live autonomous driving systems.

In conclusion, VectorNet showcases a pragmatic shift towards vectorized representations in behavior prediction, presenting an important step forward for scalable and efficient autonomous driving solutions.

PDF Markdown

Related Papers

YouTube

Show All Videos