SocialCVAE: Predicting Pedestrian Trajectory via Interaction Conditioned Latents
Abstract: Pedestrian trajectory prediction is the key technology in many applications for providing insights into human behavior and anticipating human future motions. Most existing empirical models are explicitly formulated by observed human behaviors using explicable mathematical terms with a deterministic nature, while recent work has focused on developing hybrid models combined with learning-based techniques for powerful expressiveness while maintaining explainability. However, the deterministic nature of the learned steering behaviors from the empirical models limits the models' practical performance. To address this issue, this work proposes the social conditional variational autoencoder (SocialCVAE) for predicting pedestrian trajectories, which employs a CVAE to explore behavioral uncertainty in human motion decisions. SocialCVAE learns socially reasonable motion randomness by utilizing a socially explainable interaction energy map as the CVAE's condition, which illustrates the future occupancy of each pedestrian's local neighborhood area. The energy map is generated using an energy-based interaction model, which anticipates the energy cost (i.e., repulsion intensity) of pedestrians' interactions with neighbors. Experimental results on two public benchmarks including 25 scenes demonstrate that SocialCVAE significantly improves prediction accuracy compared with the state-of-the-art methods, with up to 16.85% improvement in Average Displacement Error (ADE) and 69.18% improvement in Final Displacement Error (FDE).
- C. W. Reynolds, “Flocks, herds and schools: A distributed behavioral model,” in Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, 1987, pp. 25–34.
- C. W. Reynolds et al., “Steering behaviors for autonomous characters,” in Game Developers Conference, vol. 1999, 1999, pp. 763–782.
- D. Helbing and P. Molnar, “Social force model for pedestrian dynamics,” Physical Review E, vol. 51, no. 5, p. 4282, 1995.
- I. Karamouzas, B. Skinner, and S. J. Guy, “Universal power law governing pedestrian interactions,” Physical Review Letters, vol. 113, no. 23, p. 238701, 2014.
- S. J. Guy, J. Chhugani, S. Curtis, P. Dubey, M. C. Lin, and D. Manocha, “Pledestrians: A least-effort approach to crowd simulation.” in Symposium on Computer Animation, 2010, pp. 119–128.
- I. Karamouzas, N. Sohre, R. Narain, and S. J. Guy, “Implicit crowds: Optimization integrator for robust crowd simulation,” ACM Transactions on Graphics, vol. 36, no. 4, pp. 1–13, 2017.
- A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social lstm: Human trajectory prediction in crowded spaces,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 961–971.
- A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social gan: Socially acceptable trajectories with generative adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2255–2264.
- A. Mohamed, K. Qian, M. Elhoseiny, and C. Claudel, “Social-stgcnn: A social spatio-temporal graph convolutional neural network for human trajectory prediction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 14 424–14 432.
- I. Bae and H.-G. Jeon, “Disentangled multi-relational graph convolutional network for pedestrian trajectory prediction,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 35, no. 2, 2021, pp. 911–919.
- K. Mangalam, H. Girase, S. Agarwal, K.-H. Lee, E. Adeli, J. Malik, and A. Gaidon, “It is not the journey but the destination: endpoint conditioned trajectory prediction,” in European Conference on Computer Vision (ECCV), 2020, pp. 759–776.
- L. Shi, L. Wang, C. Long, S. Zhou, M. Zhou, Z. Niu, and G. Hua, “Sgcn: Sparse graph convolution network for pedestrian trajectory prediction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 8994–9003.
- L.-W. Tsao, Y.-K. Wang, H.-S. Lin, H.-H. Shuai, L.-K. Wong, and W.-H. Cheng, “Social-ssl: Self-supervised cross-sequence representation learning based on transformers for multi-agent trajectory prediction,” in European Conference on Computer Vision (ECCV), 2022, pp. 234–250.
- P. Kothari, B. Sifringer, and A. Alahi, “Interpretable social anchors for human trajectory forecasting in crowds,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 15 556–15 566.
- J. Yue, D. Manocha, and H. Wang, “Human trajectory prediction via neural social physics,” in European Conference on Computer Vision (ECCV), 2022, pp. 376–394.
- J. Yue, D. Manocha, and H. Wang, “Human trajectory forecasting with explainable behavioral uncertainty,” arXiv preprint arXiv: 2307.01817, 2023.
- W. Xiang, H. Wang, Y. Zhang, M. K. Yip, and X. Jin, “Model-based crowd behaviours in human-solution space,” in Computer Graphics Forum, 2023, p. e14919.
- H. Zhou, D. Ren, X. Yang, M. Fan, and H. Huang, “Sliding sequential cvae with time variant socially-aware rethinking for trajectory prediction,” arXiv preprint arXiv:2110.15016, 2021.
- S. Pellegrini, A. Ess, K. Schindler, and L. Van Gool, “You’ll never walk alone: Modeling social behavior for multi-target tracking,” in International Conference on Computer Vision (ICCV), 2009, pp. 261–268.
- A. Lerner, Y. Chrysanthou, and D. Lischinski, “Crowds by example,” in Computer Graphics Forum, vol. 26, no. 3, 2007, pp. 655–664.
- A. Robicquet, A. Sadeghian, A. Alahi, and S. Savarese, “Learning social etiquette: Human trajectory understanding in crowded scenes,” in European Conference on Computer Vision (ECCV), 2016, pp. 549–565.
- J. Ren, W. Xiang, Y. Xiao, R. Yang, D. Manocha, and X. Jin, “Heter-sim: Heterogeneous multi-agent systems simulation by interactive data-driven optimization,” IEEE Transactions on Visualization and Computer Graphics (TVCG), vol. 27, no. 3, pp. 1953–1966, 2019.
- N. Bisagno, B. Zhang, and N. Conci, “Group lstm: Group trajectory prediction in crowded scenarios,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 213–225.
- A. Vemula, K. Muelling, and J. Oh, “Social attention: Modeling attention in human crowds,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 4601–4607.
- A. Sadeghian, V. Kosaraju, A. Sadeghian, N. Hirose, H. Rezatofighi, and S. Savarese, “Sophie: An attentive gan for predicting paths compliant to social and physical constraints,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1349–1358.
- T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajectron++: dynamically-feasible trajectory forecasting with heterogeneous data,” in European Conference on Computer Vision (ECCV), 2020, pp. 683–700.
- P. Xu, J.-B. Hayet, and I. Karamouzas, “Socialvae: Human trajectory prediction using timewise latents,” in European Conference on Computer Vision (ECCV), 2022, pp. 511–528.
- C. Xu, M. Li, Z. Ni, Y. Zhang, and S. Chen, “Groupnet: Multiscale hypergraph neural networks for trajectory prediction with relational reasoning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 6498–6507.
- Y. Huang, H. Bi, Z. Li, T. Mao, and Z. Wang, “Stgat: Modeling spatial-temporal interactions for human trajectory prediction,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019, pp. 6272–6281.
- J. Duan, L. Wang, C. Long, S. Zhou, F. Zheng, L. Shi, and G. Hua, “Complementary attention gated network for pedestrian trajectory prediction,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 36, no. 1, 2022, pp. 542–550.
- Y. Wu, L. Wang, S. Zhou, J. Duan, G. Hua, and W. Tang, “Multi-stream representation learning for pedestrian trajectory prediction,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 37, no. 3, 2023, pp. 2875–2882.
- C. Yu, X. Ma, J. Ren, H. Zhao, and S. Yi, “Spatio-temporal graph transformer networks for pedestrian trajectory prediction,” in European Conference on Computer Vision (ECCV), 2020, pp. 507–523.
- Y. Yuan, X. Weng, Y. Ou, and K. M. Kitani, “Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2021, pp. 9813–9823.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, p. 1735–1780, 1997.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems (Neurips), vol. 30, 2017.
- H. Zhou, D. Ren, X. Yang, M. Fan, and H. Huang, “Csr: cascade conditional variational auto encoder with socially-aware regression for pedestrian trajectory prediction,” Pattern Recognition, vol. 133, p. 109030, 2023.
- H. Kuang, X. Li, T. Song, and S. Dai, “Analysis of pedestrian dynamics in counter flow via an extended lattice gas model,” Physical Review E, vol. 78, no. 6, p. 066117, 2008.
- H. Kuang, S. Tao, S. Dai, and X. Li, “Subconscious effect on pedestrian counter flow in a modified lattice gas model with the variable transition probability,” International Journal of Modern Physics C, vol. 20, no. 12, pp. 1945–1961, 2009.
- W. Van Toll and J. Pettré, “Algorithms for microscopic crowd simulation: Advancements in the 2010s,” Computer Graphics Forum, vol. 40, no. 2, pp. 731–754, 2021.
- G. A. Kaminka and N. Fridman, “Simulating urban pedestrian crowds of different cultures,” ACM Transactions on Intelligent Systems and Technology, vol. 9, no. 3, pp. 1–27, 2018.
- K. Mangalam, Y. An, H. Girase, and J. Malik, “From goals, waypoints & paths to long term human trajectory forecasting,” in International Conference on Computer Vision (ICCV), 2021, pp. 15 233–15 242.
- L. Shi, L. Wang, C. Long, S. Zhou, F. Zheng, N. Zheng, and G. Hua, “Social interpretable tree for pedestrian trajectory prediction,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 36, no. 2, 2022, pp. 2235–2243.
- M. Lee, S. S. Sohn, S. Moon, S. Yoon, M. Kapadia, and V. Pavlovic, “Muse-vae: multi-scale vae for environment-aware long term trajectory prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 2221–2230.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.