Transformer Networks for Trajectory Forecasting (2003.08111v3)

Published 18 Mar 2020 in cs.CV

Abstract: Most recent successes on forecasting the people motion are based on LSTM models and all most recent progress has been achieved by modelling the social interaction among people and the people interaction with the scene. We question the use of the LSTM models and propose the novel use of Transformer Networks for trajectory forecasting. This is a fundamental switch from the sequential step-by-step processing of LSTMs to the only-attention-based memory mechanisms of Transformers. In particular, we consider both the original Transformer Network (TF) and the larger Bidirectional Transformer (BERT), state-of-the-art on all natural language processing tasks. Our proposed Transformers predict the trajectories of the individual people in the scene. These are "simple" model because each person is modelled separately without any complex human-human nor scene interaction terms. In particular, the TF model without bells and whistles yields the best score on the largest and most challenging trajectory forecasting benchmark of TrajNet. Additionally, its extension which predicts multiple plausible future trajectories performs on par with more engineered techniques on the 5 datasets of ETH + UCY. Finally, we show that Transformers may deal with missing observations, as it may be the case with real sensor data. Code is available at https://github.com/FGiuliari/Trajectory-Transformer.

Citations (323)

View on Semantic Scholar

Summary

The paper introduces a novel approach to forecasting trajectories using attention-based Transformer Networks, challenging traditional LSTM models.
It demonstrates superior performance on the TrajNet benchmark by achieving lower MAD and FDE even in the presence of missing data.
The study explores BERT's potential for multi-modal predictions, highlighting the importance of larger datasets for enhanced results.

Transformer Networks for Trajectory Forecasting: An Overview

The paper presents a novel approach to trajectory forecasting by employing Transformer Networks, diverging from the conventional reliance on Long Short-Term Memory (LSTM) models. This shift from sequential step-by-step processing to attention-based mechanisms in Transformers is pivotal in enhancing trajectory prediction tasks. The paper primarily explores the use of the original Transformer Network (TF) and the Bidirectional Transformer (BERT), which have been predominantly successful in NLP.

Key Contributions

Simplicity in Modeling: Unlike existing models that incorporate complex social and scene interactions, the proposed method predicts individual trajectories by modeling each person separately. This simplistic yet effective approach challenges the necessity of intricate interaction modeling typically seen in LSTM-based methods.
Performance Evaluation: The Transformer model achieves the highest score on the challenging TrajNet benchmark, which is composed of diverse datasets for trajectory forecasting. Additionally, when extended to predict multiple plausible future trajectories, it performs on par with more complex methods on other datasets such as ETH+UCY.
Handling Missing Data: The attention mechanism of Transformers allows for effective processing even in scenarios with missing observations, a common issue when dealing with sensor data in real-world applications.

Results and Implications

Numerical Strengths: The paper reports that the Transformer Network outperforms existing methods on the TrajNet benchmark. Specifically, it achieves lower Mean Average Displacement (MAD) and Final Displacement Error (FDE) compared to leading approaches that consider social interactions.
Contrast with LSTM: The paper provides evidence of the Transformer model's superiority over LSTM, especially in long-term prediction tasks. The ability to handle missing data and observe input sequences in a non-sequential manner offers a significant advantage in processing efficiency and prediction accuracy.
Potential of BERT: Although the paper explores BERT's application to trajectory forecasting, the results highlight the need for larger datasets to fully leverage its capabilities, hinting at potential future growth areas.

Theoretical and Practical Implications

The move towards Transformer Networks reshapes the understanding of sequence modeling in trajectory forecasting. It challenges the established models, suggesting that complex interaction terms may not be as crucial as previously thought. This simpler, yet robust modeling can inspire further research into reducing model complexity while maintaining or improving performance.

Future Directions

Expansion to Larger Datasets: Given the vast data requirements for BERT, future studies involving larger trajectory datasets could reveal more about its applicability and performance enhancements.
Exploration of Multi-modal Predictions: The transformer's capacity for generating multi-modal predictions aligns well with real-world scenarios where multiple future paths are plausible. This aspect could be explored further to develop richer predictive capabilities.

In summary, this paper introduces a transformative approach to trajectory forecasting through the use of attention-based Transformer Networks. By demonstrating superior performance over traditional LSTM models and highlighting its utility in handling missing data, the paper paves the way for future research and applications in AI-driven motion prediction systems.

PDF Markdown

Related Papers

GitHub

GitHub - FGiuliari/Trajectory-Transformer: Code for "Transformer Networks for Trajectory Forecasting" (338 stars)