From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting (2012.01526v1)

Published 2 Dec 2020 in cs.CV, cs.AI, and cs.RO

Abstract: Human trajectory forecasting is an inherently multi-modal problem. Uncertainty in future trajectories stems from two sources: (a) sources that are known to the agent but unknown to the model, such as long term goals and (b)sources that are unknown to both the agent & the model, such as intent of other agents & irreducible randomness indecisions. We propose to factorize this uncertainty into its epistemic & aleatoric sources. We model the epistemic un-certainty through multimodality in long term goals and the aleatoric uncertainty through multimodality in waypoints& paths. To exemplify this dichotomy, we also propose a novel long term trajectory forecasting setting, with prediction horizons upto a minute, an order of magnitude longer than prior works. Finally, we presentY-net, a scene com-pliant trajectory forecasting network that exploits the pro-posed epistemic & aleatoric structure for diverse trajectory predictions across long prediction horizons.Y-net significantly improves previous state-of-the-art performance on both (a) The well studied short prediction horizon settings on the Stanford Drone & ETH/UCY datasets and (b) The proposed long prediction horizon setting on the re-purposed Stanford Drone & Intersection Drone datasets.

Authors (4)

Karttikeya Mangalam (32 papers)
Yang An (19 papers)
Harshayu Girase (4 papers)
Jitendra Malik (211 papers)

Citations (229)

View on Semantic Scholar

Summary

The paper proposes a hierarchical Y-net that separately models epistemic and aleatoric uncertainties for long-term human trajectory forecasting.
It utilizes a heatmap-based U-net architecture integrating scene context and past motion history to align trajectory predictions with environmental semantics.
Empirical results on SDD and ETH/UCY benchmarks show significant improvements in Average Displacement Error, especially with over 50% gains in long-term forecasts.

An Overview of "From Goals, Waypoints, Paths to Long Term Human Trajectory Forecasting"

Human trajectory forecasting presents unique challenges due to the inherent multimodality of human motion. Forecasting must account for various factors of uncertainty, distinguishing between epistemic uncertainties related to long-term goals known to agents but not models, and aleatoric uncertainties related to randomness and environmental interactions. The paper "From Goals, Waypoints, Paths To Long Term Human Trajectory Forecasting" introduces a novel approach to tackle these challenges by distinctly modeling the two sources of uncertainty.

Methodological Insights

The authors propose a hierarchical model, $\textsf{Y}$ -net, which separately handles epistemic and aleatoric uncertainties to predict long-term human trajectories up to one minute. This is achieved through a Heatmap-based U-net architecture that leverages scene semantics and past motion history. The network consists of three main components:

Trajectory on Scene Heatmap Representation: The model represents trajectory and scene context on a spatial grid, aligning the trajectory data with scene semantics.
Epistemic Uncertainty Modeling: It predicts a probability distribution over possible long-term goals, factoring in the epistemic uncertainty related to latent decision variables.
Aleatoric Uncertainty Modeling: Conditioned on sampled goals, it models diverse paths and predicts intermediate waypoints, addressing the aleatoric uncertainty stemming from extrinsic factors.

Empirical Outcomes

The results are evaluated on established datasets such as Stanford Drone Dataset (SDD) and ETH/UCY benchmarks in both short (up to 4.8 seconds) and long-term (up to 60 seconds) scenarios. Notably, $\textsf{Y}$ -net outperforms prior state-of-the-art methods by significant margins, achieving improvement of 26.9% and 5.6% in ADE (Average Displacement Error) on SDD and ETH/UCY respectively for short-term settings, and substantial gains over 50% in long-term forecast metrics. These improvements underline the effectiveness of factorizing multimodality into goal and path components.

Theoretical and Practical Implications

The theoretical contribution of $\textsf{Y}$ -net is the validation of a factorized approach to modeling uncertainty in trajectory forecasting, breaking new ground in understanding the stochastic nature of human motion over extended predictions. Practically, this advancement holds considerable promise for applications in autonomous navigation, robotics, and human interaction modeling, where understanding human intent and movement is crucial.

Future Directions

The paper suggests several avenues for future research, including the extension of the framework to other dynamic environments or agent types, the exploration of different architectures for the separate uncertainty components, and improving the robustness of scene-compliance for more complex real-world scenarios.

In conclusion, the paper provides a substantial contribution to the domain of human trajectory forecasting, presenting a clear framework for addressing the multimodal nature of human motion and setting a new benchmark for long-term prediction of human trajectories.

PDF Markdown