Convolutional Social Pooling for Vehicle Trajectory Prediction (1805.06771v1)

Published 15 May 2018 in cs.CV

Abstract: Forecasting the motion of surrounding vehicles is a critical ability for an autonomous vehicle deployed in complex traffic. Motion of all vehicles in a scene is governed by the traffic context, i.e., the motion and relative spatial configuration of neighboring vehicles. In this paper we propose an LSTM encoder-decoder model that uses convolutional social pooling as an improvement to social pooling layers for robustly learning interdependencies in vehicle motion. Additionally, our model outputs a multi-modal predictive distribution over future trajectories based on maneuver classes. We evaluate our model using the publicly available NGSIM US-101 and I-80 datasets. Our results show improvement over the state of the art in terms of RMS values of prediction error and negative log-likelihoods of true future trajectories under the model's predictive distribution. We also present a qualitative analysis of the model's predicted distributions for various traffic scenarios.

Authors (2)

Nachiket Deo (18 papers)
Mohan M. Trivedi (32 papers)

Citations (695)

View on Semantic Scholar

Summary

Convolutional Social Pooling for Vehicle Trajectory Prediction

This paper addresses the challenge of predicting vehicle trajectories in complex traffic environments, which is essential for autonomous driving systems. It introduces a novel approach utilizing Long Short-Term Memory (LSTM) networks with convolutional social pooling techniques to enhance predictive modeling of inter-vehicular dynamics. The research is primarily focused on freeway traffic scenarios, leveraging data from the NGSIM US-101 and I-80 datasets for evaluation.

Model Architecture

The proposed model employs an LSTM encoder-decoder structure, incorporating convolutional social pooling to capture spatial interdependencies between vehicles effectively. This method diverges from previous approaches referenced in Alahi et al., by substituting fully connected layers with convolutional layers, thus retaining spatial integrity and improving generalization. The model also introduces a maneuver-based decoder to handle the multi-modal nature of vehicle movement, generating probabilities across multiple maneuver classes.

Methodology

Convolutional Social Pooling: This technique is employed to refine how vehicle motion interactions are modeled spatially. By using convolutional and max-pooling layers, the method significantly improves upon the fully connected social pooling approach, leading to better generalization even with varying spatial configurations of vehicles.
Maneuver-Based Decoding: The inclusion of lateral and longitudinal maneuver classes allows the network to produce a distribution over possible future trajectories, accommodating inherent multi-modality in driver behavior. This aspect is crucial in capturing the non-deterministic nature of driving where multiple plausible actions may exist given the same initial conditions.

Results and Comparative Analysis

The model is evaluated using RMSE and negative log-likelihood (NLL) metrics. Results demonstrate that the convolutional social pooling model (CS-LSTM) outperforms baseline models, including constant velocity and more complex interaction-aware models like those utilizing variational Gaussian mixture models with interaction modules. The CS-LSTM achieves lower RMSE, which is indicative of more accurate trajectory predictions.

Notably, the introduction of maneuver-based predictions (CS-LSTM(M)) further decreases NLL values, indicating a better fit for the predictive distribution to actual recorded trajectories. This highlights the effectiveness of multi-modal prediction capabilities in capturing true driving behaviors.

Implications and Future Directions

The convolutional social pooling approach advances the state-of-the-art in trajectory prediction by more robustly modeling vehicular interactions, which is critical for autonomous decision-making in dynamic traffic settings. This mechanism offers a promising direction for applications that require high fidelity in predicting complex scenarios arising in traffic systems.

Further research could enhance this model by integrating additional contextual cues, such as visual data and map-based information. These resources are likely to improve maneuver classification accuracy by providing richer environmental context. Additionally, exploring other traffic environments, such as urban grids or intersections, could extend the applicability of the proposed model.

Conclusion

The paper presents a significant contribution to vehicle trajectory prediction by introducing a convolutional social pooling mechanism combined with a maneuver-based decoding strategy. This approach effectively captures the multi-modal interactions in dynamic traffic, offering improved predictive accuracy over existing models. The continued development in this area is a vital step toward safer and more efficient autonomous transportation systems.

PDF Markdown

Related Papers

YouTube

Show All Videos