HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention (2404.06351v2)
Abstract: Predicting the trajectories of road agents is essential for autonomous driving systems. The recent mainstream methods follow a static paradigm, which predicts the future trajectory by using a fixed duration of historical frames. These methods make the predictions independently even at adjacent time steps, which leads to potential instability and temporal inconsistency. As successive time steps have largely overlapping historical frames, their forecasting should have intrinsic correlation, such as overlapping predicted trajectories should be consistent, or be different but share the same motion goal depending on the road situation. Motivated by this, in this work, we introduce HPNet, a novel dynamic trajectory forecasting method. Aiming for stable and accurate trajectory forecasting, our method leverages not only historical frames including maps and agent states, but also historical predictions. Specifically, we newly design a Historical Prediction Attention module to automatically encode the dynamic relationship between successive predictions. Besides, it also extends the attention range beyond the currently visible window benefitting from the use of historical predictions. The proposed Historical Prediction Attention together with the Agent Attention and Mode Attention is further formulated as the Triple Factorized Attention module, serving as the core design of HPNet.Experiments on the Argoverse and INTERACTION datasets show that HPNet achieves state-of-the-art performance, and generates accurate and stable future trajectories. Our code are available at https://github.com/XiaolongTang23/HPNet.
- End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
- Implicit latent variable model for scene-consistent motion forecasting. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
- Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction. In Conference on Robot Learning (CoRL), 2020.
- Argoverse: 3d tracking and forecasting with rich maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Traj-mae: Masked autoencoders for trajectory prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Forecast-mae: Self-supervised pre-training for motion forecasting with masked autoencoders. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- R-pred: Two-stage motion prediction via tube-query attention-based trajectory refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Multimodal trajectory predictions for autonomous driving using deep convolutional networks. In IEEE International Conference on Robotics and Automation (ICRA), 2019.
- Multimodal trajectory prediction conditioned on lane-graph traversals. In Conference on Robot Learning (CoRL), 2022.
- Macformer: Map-agent coupled transformer for real-time and robust trajectory prediction. IEEE Robotics and Automation Letters (RA-L), 2023.
- Vectornet: Encoding hd maps and agent dynamics from vectorized representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Dynamic scenario representation learning for motion forecasting with heterogeneous graph convolutional recurrent networks. IEEE Robotics and Automation Letters (RA-L), 2023.
- Home: Heatmap output for future motion estimation. In IEEE International Intelligent Transportation Systems Conference (ITSC), 2021.
- Gohome: Graph-oriented heatmap output for future motion estimation. In IEEE International Conference on Robotics and Automation (ICRA), 2022a.
- Thomas: Trajectory heatmap output with learned multi-agent sampling. In Proceedings of the International Conference on Learning Representations (ICLR), 2022b.
- Latent variable sequential set transformers for joint multi-agent motion prediction. arXiv preprint arXiv:2104.00563, 2021.
- Densetnt: End-to-end trajectory prediction from dense goal sets. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- Social gan: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Hdgt: Heterogeneous driving graph transformer for multi-agent trajectory prediction via scene encoding. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023.
- Lapred: Lane-aware prediction of multi-modal future trajectories of dynamic agents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Stochastic multiple choice learning for training diverse deep ensembles. In Advances in Neural Information Processing Systems (NIPS), 2016.
- Fsr: A general frequency-oriented framework to accelerate image super-resolution networks. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023.
- Learning lane graph representations for motion forecasting. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
- Multimodal motion prediction with stacked transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.
- Wayformer: Motion forecasting via simple & efficient attention networks. In IEEE International Conference on Robotics and Automation (ICRA), 2023.
- Scene transformer: A unified architecture for predicting future trajectories of multiple agents. In Proceedings of the International Conference on Learning Representations (ICLR), 2021.
- Covernet: Multimodal behavior prediction using trajectory sets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Sophie: An attentive gan for predicting paths compliant to social and physical constraints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
- Motionlm: Multi-agent motion forecasting as language modeling. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Motion transformer with global intention localization and local movement refinement. Advances in Neural Information Processing Systems (NIPS), 2022.
- Multipath++: Efficient information fusion and trajectory aggregation for behavior prediction. In IEEE International Conference on Robotics and Automation (ICRA), 2022.
- Attention is all you need. In Advances in Neural Information Processing Systems (NIPS), 2017.
- Ltp: Lane-based trajectory prediction for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Ganet: Goal area network for motion forecasting. In IEEE International Conference on Robotics and Automation (ICRA), 2023a.
- Prophnet: Efficient agent-centric motion forecasting with anchor-informed proposals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023b.
- Tpcn: Temporal point cloud networks for motion forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Dcms: Motion forecasting with dual consistency and multi-pseudo-target supervision. arXiv preprint arXiv:2204.05859, 2022.
- Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- Interaction dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps. arXiv preprint arXiv:1910.03088, 2019.
- Map-adaptive goal-based trajectory prediction. In Conference on Robot Learning (CoRL), 2021.
- Real-time motion prediction via heterogeneous polyline transformer with relative pose encoding. 2024.
- Tnt: Target-driven trajectory prediction. In Conference on Robot Learning (CoRL), 2021.
- Hivt: Hierarchical vector transformer for multi-agent motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Query-centric trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Biff: Bi-level future fusion with polyline-based coordinate for interactive trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.