Learning Socio-Temporal Graphs for Multi-Agent Trajectory Prediction (2312.14373v1)
Abstract: In order to predict a pedestrian's trajectory in a crowd accurately, one has to take into account her/his underlying socio-temporal interactions with other pedestrians consistently. Unlike existing work that represents the relevant information separately, partially, or implicitly, we propose a complete representation for it to be fully and explicitly captured and analyzed. In particular, we introduce a Directed Acyclic Graph-based structure, which we term Socio-Temporal Graph (STG), to explicitly capture pair-wise socio-temporal interactions among a group of people across both space and time. Our model is built on a time-varying generative process, whose latent variables determine the structure of the STGs. We design an attention-based model named STGformer that affords an end-to-end pipeline to learn the structure of the STGs for trajectory prediction. Our solution achieves overall state-of-the-art prediction accuracy in two large-scale benchmark datasets. Our analysis shows that a person's past trajectory is critical for predicting another person's future path. Our model learns this relationship with a strong notion of socio-temporal localities. Statistics show that utilizing this information explicitly for prediction yields a noticeable performance gain with respect to the trajectory-only approaches.
- C. Tang and R. R. Salakhutdinov, “Multiple futures prediction,” in Advances in Neural Information Processing Systems, 2019, pp. 15 424–15 434.
- J. Li, F. Yang, M. Tomizuka, and C. Choi, “Evolvegraph: Multi-agent trajectory prediction with dynamic relational reasoning,” Advances in neural information processing systems, vol. 33, pp. 19 783–19 794, 2020.
- S. Casas, A. Sadat, and R. Urtasun, “MP3: A Unified Model To Map, Perceive, Predict and Plan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 14 403–14 412.
- M. Liang, B. Yang, W. Zeng, Y. Chen, R. Hu, S. Casas, and R. Urtasun, “Pnpnet: End-to-end perception and prediction with tracking in the loop,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 553–11 562.
- J. Guan, Y. Yuan, K. M. Kitani, and N. Rhinehart, “Generative hybrid representations for activity forecasting with no-regret learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 173–182.
- K.-H. Zeng, R. Mottaghi, L. Weihs, and A. Farhadi, “Visual Reaction: Learning to Play Catch with Your Drone,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 573–11 582.
- J. Jara-Ettinger, H. Gweon, L. E. Schulz, and J. B. Tenenbaum, “The naïve utility calculus: Computational principles underlying commonsense psychology,” Trends in cognitive sciences, vol. 20, no. 8, pp. 589–604, 2016.
- C. L. Baker, J. Jara-Ettinger, R. Saxe, and J. B. Tenenbaum, “Rational quantitative attribution of beliefs, desires and percepts in human mentalizing,” Nature Human Behaviour, vol. 1, no. 4, pp. 1–10, 2017.
- M. Kryven, T. D. Ullman, W. Cowan, and J. B. Tenenbaum, “Plans or outcomes: How do we attribute intelligence to others?” Cognitive Science, vol. 45, no. 9, p. e13041, 2021.
- A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social lstm: Human trajectory prediction in crowded spaces,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 961–971.
- A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social GAN: Socially Acceptable Trajectories With Generative Adversarial Networks,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- K. Mangalam, H. Girase, S. Agarwal, K.-H. Lee, E. Adeli, J. Malik, and A. Gaidon, “It is not the journey but the destination: Endpoint conditioned trajectory prediction,” in European Conference on Computer Vision. Springer, 2020, pp. 759–776.
- P. Kothari, B. Sifringer, and A. Alahi, “Interpretable Social Anchors for Human Trajectory Forecasting in Crowds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 15 556–15 566.
- Y. Liu, Q. Yan, and A. Alahi, “Social nce: Contrastive learning of socially-aware motion representations,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 118–15 129.
- J. Yue, D. Manocha, and H. Wang, “Human trajectory prediction via neural social physics,” arXiv preprint arXiv:2207.10435, 2022.
- V. Kosaraju, A. Sadeghian, R. Martín-Martín, I. Reid, H. Rezatofighi, and S. Savarese, “Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- G. Chen, J. Li, J. Lu, and J. Zhou, “Human trajectory prediction via counterfactual analysis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9824–9833.
- C. Yu, X. Ma, J. Ren, H. Zhao, and S. Yi, “Spatio-temporal graph transformer networks for pedestrian trajectory prediction,” in European Conference on Computer Vision. Springer, 2020, pp. 507–523.
- Y. Yuan, X. Weng, Y. Ou, and K. M. Kitani, “Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9813–9823.
- C. Wong, B. Xia, Z. Hong, Q. Peng, and X. You, “View vertically: A hierarchical network for trajectory prediction via fourier spectrums,” arXiv preprint arXiv:2110.07288, 2021.
- T. Gu, G. Chen, J. Li, C. Lin, Y. Rao, J. Zhou, and J. Lu, “Stochastic trajectory prediction via motion indeterminacy diffusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- C. Xu, W. Mao, W. Zhang, and S. Chen, “Remember intentions: Retrospective-memory-based trajectory prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6488–6497.
- D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
- L. Lorch, J. Rothfuss, B. Schölkopf, and A. Krause, “Dibs: Differentiable bayesian structure learning,” Advances in Neural Information Processing Systems, vol. 34, 2021.
- Y. Rubanova, R. T. Q. Chen, and D. K. Duvenaud, “Latent Ordinary Differential Equations for Irregularly-Sampled Time Series,” in Advances in Neural Information Processing Systems 32 (NeurIPs 19), H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds. Curran Associates, Inc., 2019, pp. 5320–5330.
- B. Pang, T. Zhao, X. Xie, and Y. N. Wu, “Trajectory prediction with latent belief energy-based model,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 814–11 824.
- Y. Du and I. Mordatch, “Implicit Generation and Modeling with Energy Based Models,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc., 2019.
- Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- Y. Huang, H. Bi, Z. Li, T. Mao, and Z. Wang, “STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction,” in The IEEE International Conference on Computer Vision (ICCV), October 2019.
- T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajectron++: Multi-agent generative trajectory forecasting with heterogeneous data for control,” in The European Conference on Computer Vision (ECCV), 2020.
- X. Weng, B. Ivanovic, K. Kitani, and M. Pavone, “Whose track is it anyway? improving robustness to tracking errors with affinity-based trajectory prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 6573–6582.
- L.-W. Tsao, Y.-K. Wang, H.-S. Lin, H.-H. Shuai, L.-K. Wong, and W.-H. Cheng, “Social-ssl: Self-supervised cross-sequence representation learning based on transformers for multi-agent trajectory prediction,” in European Conference on Computer Vision. Springer, 2022, pp. 234–250.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
- Y. Liu, R. Cadei, J. Schweizer, S. Bahmani, and A. Alahi, “Towards robust and adaptive motion forecasting: A causal representation perspective,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 081–17 092.
- J. C. Nascimento, M. A. Figueiredo, and J. S. Marques, “Trajectory classification using switched dynamical hidden markov models,” IEEE Transactions on Image Processing, vol. 19, no. 5, pp. 1338–1348, 2009.
- S. Yi, H. Li, and X. Wang, “Pedestrian behavior modeling from stationary crowds with applications to intelligent surveillance,” IEEE transactions on image processing, vol. 25, no. 9, pp. 4354–4368, 2016.
- M. Wu, H. Ling, N. Bi, S. Gao, Q. Hu, H. Sheng, and J. Yu, “Visual tracking with multiview trajectory prediction,” IEEE Transactions on Image Processing, vol. 29, pp. 8355–8367, 2020.
- R. Quan, L. Zhu, Y. Wu, and Y. Yang, “Holistic lstm for pedestrian trajectory prediction,” IEEE transactions on image processing, vol. 30, pp. 3229–3239, 2021.
- Y. Li, A. Torralba, A. Anandkumar, D. Fox, and A. Garg, “Causal Discovery in Physical Systems from Videos,” in Advances in Neural Information Processing Systems, 2020, pp. 9180–9192.
- M. Yang, F. Liu, Z. Chen, X. Shen, J. Hao, and J. Wang, “Causalvae: Disentangled representation learning via neural structural causal models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9593–9602.
- F. Huang, H. Zhou, Y. Liu, H. Li, and M. Huang, “Directed acyclic transformer for non-autoregressive machine translation,” in Proceedings of the 39th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, Eds., vol. 162. PMLR, 17–23 Jul 2022, pp. 9410–9428.
- J. Li, F. Yang, H. Ma, S. Malla, M. Tomizuka, and C. Choi, “Rain: Reinforced hybrid attention inference network for motion forecasting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 16 096–16 106.
- C. Xu, M. Li, Z. Ni, Y. Zhang, and S. Chen, “Groupnet: Multiscale hypergraph neural networks for trajectory prediction with relational reasoning,” in The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- F. Giuliari, I. Hasan, M. Cristani, and F. Galasso, “Transformer networks for trajectory forecasting,” in 2020 25th international conference on pattern recognition (ICPR). IEEE, 2021, pp. 10 335–10 342.
- L. Li, M. Pagnucco, and Y. Song, “Graph-based spatial transformer with memory replay for multi-future pedestrian trajectory prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2231–2241.
- Z. Yin, R. Liu, Z. Xiong, and Z. Yuan, “Multimodal transformer networks for pedestrian trajectory prediction.” in IJCAI, 2021, pp. 1259–1265.
- T. N. Kipf and M. Welling, “Variational graph auto-encoders,” arXiv preprint arXiv:1611.07308, 2016.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
- Y. Yuan, X. Weng, Y. Ou, and K. Kitani, “AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in International Conference on Learning Representations, 2019.
- K. Mangalam, Y. An, H. Girase, and J. Malik, “From goals, waypoints & paths to long term human trajectory forecasting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 233–15 242.
- M. Lee, S. S. Sohn, S. Moon, S. Yoon, M. Kapadia, and V. Pavlovic, “Muse-vae: multi-scale vae for environment-aware long term trajectory prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2221–2230.
- I. Bae, J.-H. Park, and H.-G. Jeon, “Learning pedestrian group representations for multi-modal trajectory prediction,” in European Conference on Computer Vision. Springer, 2022, pp. 270–289.
- J. Sun, Y. Li, H.-S. Fang, and C. Lu, “Three steps to multimodal trajectory prediction: Modality clustering, classification and synthesis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 13 250–13 259.
- I. Bae, J.-H. Park, and H.-G. Jeon, “Non-probability sampling network for stochastic human trajectory prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6477–6487.
- H. Zhao and R. P. Wildes, “Where are you heading? dynamic trajectory prediction with expert goal examples,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 7629–7638.
- C. Wang, Y. Wang, M. Xu, and D. J. Crandall, “Stepwise goal-driven networks for trajectory prediction,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2716–2723, 2022.
- S. Pellegrini, A. Ess, K. Schindler, and L. van Gool, “You’ll never walk alone: Modeling social behavior for multi-target tracking,” in 2009 IEEE 12th International Conference on Computer Vision (ICCV), 2009.
- A. Lerner, Y. Chrysanthou, and D. Lischinski, “Crowds by example,” in Computer graphics forum, vol. 26, no. 3. Wiley Online Library, 2007, pp. 655–664.
- A. Robicquet, A. Sadeghian, A. Alahi, and S. Savarese, “Learning social etiquette: Human trajectory understanding in crowded scenes,” in European conference on computer vision. Springer, 2016, pp. 549–565.
- P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph Attention Networks,” in International Conference on Learning Representations (ICLR), 2018.