Intention-aware Denoising Diffusion Model for Trajectory Prediction (2403.09190v1)
Abstract: Trajectory prediction is an essential component in autonomous driving, particularly for collision avoidance systems. Considering the inherent uncertainty of the task, numerous studies have utilized generative models to produce multiple plausible future trajectories for each agent. However, most of them suffer from restricted representation ability or unstable training issues. To overcome these limitations, we propose utilizing the diffusion model to generate the distribution of future trajectories. Two cruxes are to be settled to realize such an idea. First, the diversity of intention is intertwined with the uncertain surroundings, making the true distribution hard to parameterize. Second, the diffusion process is time-consuming during the inference phase, rendering it unrealistic to implement in a real-time driving system. We propose an Intention-aware denoising Diffusion Model (IDM), which tackles the above two problems. We decouple the original uncertainty into intention uncertainty and action uncertainty and model them with two dependent diffusion processes. To decrease the inference time, we reduce the variable dimensions in the intention-aware diffusion process and restrict the initial distribution of the action-aware diffusion process, which leads to fewer diffusion steps. To validate our approach, we conduct experiments on the Stanford Drone Dataset (SDD) and ETH/UCY dataset. Our methods achieve state-of-the-art results, with an FDE of 13.83 pixels on the SDD dataset and 0.36 meters on the ETH/UCY dataset. Compared with the original diffusion model, IDM reduces inference time by two-thirds. Interestingly, our experiments further reveal that introducing intention information is beneficial in modeling the diffusion process of fewer steps.
- Y. Huang, J. Du, Z. Yang, Z. Zhou, L. Zhang, and H. Chen, “A survey on trajectory-prediction methods for autonomous driving,” IEEE Transactions on Intelligent Vehicles, vol. 7, no. 3, pp. 652–674, 2022.
- W. Zhu, Y. Liu, M. Zhang, and Y. Yi, “Reciprocal consistency prediction network for multi-step human trajectory prediction,” IEEE Transactions on Intelligent Transportation Systems, 2023.
- X. Wang, J. Alonso-Mora, and M. Wang, “Probabilistic risk metric for highway driving leveraging multi-modal trajectory predictions,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 10, pp. 19399–19412, 2022.
- A. Rudenko, L. Palmieri, M. Herman, K. M. Kitani, D. M. Gavrila, and K. O. Arras, “Human motion trajectory prediction: A survey,” The International Journal of Robotics Research, vol. 39, no. 8, pp. 895–935, 2020.
- H. Hu, Q. Wang, M. Cheng, and Z. Gao, “Trajectory prediction neural network and model interpretation based on temporal pattern attention,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 3, pp. 2746–2759, 2022.
- D. Helbing and P. Molnar, “Social force model for pedestrian dynamics,” Physical review E, vol. 51, no. 5, p. 4282, 1995.
- H. Xue, D. Q. Huynh, and M. Reynolds, “Ss-lstm: A hierarchical lstm model for pedestrian trajectory prediction,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1186–1194, IEEE, 2018.
- E. Wang, H. Cui, S. Yalamanchi, M. Moorthy, and N. Djuric, “Improving movement predictions of traffic actors in bird’s-eye view models using gans and differentiable trajectory rasterization,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2340–2348, 2020.
- J. Gao, C. Sun, H. Zhao, Y. Shen, D. Anguelov, C. Li, and C. Schmid, “Vectornet: Encoding hd maps and agent dynamics from vectorized representation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11525–11533, 2020.
- H. Zhao, J. Gao, T. Lan, C. Sun, B. Sapp, B. Varadarajan, Y. Shen, Y. Shen, Y. Chai, C. Schmid, et al., “Tnt: Target-driven trajectory prediction,” in Conference on Robot Learning, pp. 895–904, PMLR, 2021.
- V. Trentin, A. Artuñedo, J. Godoy, and J. Villagra, “Multi-modal interaction-aware motion prediction at unsignalized intersections,” IEEE Transactions on Intelligent Vehicles, 2023.
- R. Huang, H. Xue, M. Pagnucco, F. Salim, and Y. Song, “Multimodal trajectory prediction: A survey,” arXiv preprint arXiv:2302.10463, 2023.
- A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social gan: Socially acceptable trajectories with generative adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2255–2264, 2018.
- L. A. Thiede and P. P. Brahma, “Analyzing the variety loss in the context of probabilistic trajectory prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9954–9963, 2019.
- N. Deo, E. Wolff, and O. Beijbom, “Multimodal trajectory prediction conditioned on lane-graph traversals,” in Conference on Robot Learning, pp. 203–212, PMLR, 2022.
- K. Guo, W. Liu, and J. Pan, “End-to-end trajectory distribution prediction based on occupancy grid maps,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2242–2251, 2022.
- A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social lstm: Human trajectory prediction in crowded spaces,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 961–971, 2016.
- A. Mohamed, K. Qian, M. Elhoseiny, and C. Claudel, “Social-stgcnn: A social spatio-temporal graph convolutional neural network for human trajectory prediction,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14424–14432, 2020.
- T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pp. 683–700, Springer, 2020.
- H. Ben-Younes, E. Zablocki, M. Chen, P. Pérez, and M. Cord, “Raising context awareness in motion forecasting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4409–4418, 2022.
- Y. Liu, Q. Yan, and A. Alahi, “Social nce: Contrastive learning of socially-aware motion representations,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15118–15129, 2021.
- G. Chen, J. Li, N. Zhou, L. Ren, and J. Lu, “Personalized trajectory prediction via distribution discrimination,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15580–15589, 2021.
- K. Mangalam, Y. An, H. Girase, and J. Malik, “From goals, waypoints & paths to long term human trajectory forecasting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15233–15242, 2021.
- P. Dendorfer, S. Elflein, and L. Leal-Taixé, “Mg-gan: A multi-generator model preventing out-of-distribution samples in pedestrian trajectory prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13158–13167, 2021.
- L. Fang, Q. Jiang, J. Shi, and B. Zhou, “Tpnet: Trajectory proposal network for motion prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6797–6806, 2020.
- V. Kosaraju, A. Sadeghian, R. Martín-Martín, I. Reid, H. Rezatofighi, and S. Savarese, “Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- R. Liang, Y. Li, J. Zhou, and X. Li, “Stglow: A flow-based generative framework with dual graphormer for pedestrian trajectory prediction,” arXiv preprint arXiv:2211.11220, 2022.
- J. Choi, S. Kim, Y. Jeong, Y. Gwon, and S. Yoon, “Ilvr: Conditioning method for denoising diffusion probabilistic models,” arXiv preprint arXiv:2108.02938, 2021.
- P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794, 2021.
- A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in International Conference on Machine Learning, pp. 8162–8171, PMLR, 2021.
- J. Ho and S. Ermon, “Generative adversarial imitation learning,” Advances in neural information processing systems, vol. 29, 2016.
- T. Gu, G. Chen, J. Li, C. Lin, Y. Rao, J. Zhou, and J. Lu, “Stochastic trajectory prediction via motion indeterminacy diffusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17113–17122, 2022.
- J. Ho, T. Salimans, A. Gritsenko, W. Chan, M. Norouzi, and D. J. Fleet, “Video diffusion models,” arXiv preprint arXiv:2204.03458, 2022.
- W. Harvey, S. Naderiparizi, V. Masrani, C. Weilbach, and F. Wood, “Flexible diffusion modeling of long videos,” arXiv preprint arXiv:2205.11495, 2022.
- N. Chen, Y. Zhang, H. Zen, R. J. Weiss, M. Norouzi, and W. Chan, “Wavegrad: Estimating gradients for waveform generation,” arXiv preprint arXiv:2009.00713, 2020.
- Z. Kong, W. Ping, J. Huang, K. Zhao, and B. Catanzaro, “Diffwave: A versatile diffusion model for audio synthesis,” arXiv preprint arXiv:2009.09761, 2020.
- C. Meng, Y. He, Y. Song, J. Song, J. Wu, J.-Y. Zhu, and S. Ermon, “Sdedit: Guided image synthesis and editing with stochastic differential equations,” in International Conference on Learning Representations, 2021.
- R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
- A. Elnagar and K. Gupta, “Motion prediction of moving objects based on autoregressive model,” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 28, no. 6, pp. 803–810, 1998.
- A. Barth and U. Franke, “Where will the oncoming vehicle be the next second?,” in 2008 IEEE Intelligent Vehicles Symposium, pp. 1068–1073, IEEE, 2008.
- Y. Cai, N. de Freitas, and J. J. Little, “Robust visual tracking for multiple targets,” in Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006, Proceedings, Part IV 9, pp. 107–118, Springer, 2006.
- A. Elnagar, “Prediction of moving objects in dynamic environments using kalman filters,” in Proceedings 2001 IEEE International Symposium on Computational Intelligence in Robotics and Automation (Cat. No. 01EX515), pp. 414–419, IEEE, 2001.
- R. Schubert, E. Richter, and G. Wanielik, “Comparison and evaluation of advanced motion models for vehicle tracking,” in 2008 11th international conference on information fusion, pp. 1–6, IEEE, 2008.
- A. Møgelmose, M. M. Trivedi, and T. B. Moeslund, “Trajectory analysis and prediction for improved pedestrian safety: Integrated framework and evaluations,” in 2015 IEEE intelligent vehicles symposium (IV), pp. 330–335, IEEE, 2015.
- C. Yang, M. Bakich, and E. Blasch, “Nonlinear constrained tracking of targets on roads,” in 2005 7th International Conference on Information Fusion, vol. 1, pp. 8–pp, IEEE, 2005.
- I. Batkovic, M. Zanon, N. Lubbe, and P. Falcone, “A computationally efficient model for pedestrian motion prediction,” in 2018 European control conference (ECC), pp. 374–379, IEEE, 2018.
- D. Petrich, T. Dang, D. Kasper, G. Breuel, and C. Stiller, “Map-based long term motion prediction for vehicles in traffic environments,” in 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), pp. 2166–2172, IEEE, 2013.
- X. Yan, I. A. Kakadiaris, and S. K. Shah, “Modeling local behavior for predicting social interactions towards human tracking,” Pattern Recognition, vol. 47, no. 4, pp. 1626–1641, 2014.
- I. Karamouzas, P. Heil, P. Van Beek, and M. H. Overmars, “A predictive collision avoidance model for pedestrian simulation,” in Motion in Games: Second International Workshop, MIG 2009, Zeist, The Netherlands, November 21-24, 2009. Proceedings 2, pp. 41–52, Springer, 2009.
- F. Zanlungo, T. Ikeda, and T. Kanda, “Social force model with explicit collision prediction,” Europhysics Letters, vol. 93, no. 6, p. 68005, 2011.
- H. Gong, J. Sim, M. Likhachev, and J. Shi, “Multi-hypothesis motion planning for visual object tracking,” in 2011 International Conference on Computer Vision, pp. 619–626, IEEE, 2011.
- M. Bahram, A. Lawitzky, J. Friedrichs, M. Aeberhard, and D. Wollherr, “A game-theoretic approach to replanning-aware interactive scene prediction and planning,” IEEE Transactions on Vehicular Technology, vol. 65, no. 6, pp. 3981–3992, 2015.
- Y. F. Chen, M. Liu, M. Everett, and J. P. How, “Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning,” in 2017 IEEE international conference on robotics and automation (ICRA), pp. 285–292, IEEE, 2017.
- C. Rösmann, F. Hoffmann, and T. Bertram, “Timed-elastic-bands for time-optimal point-to-point nonlinear model predictive control,” in 2015 european control conference (ECC), pp. 3352–3357, IEEE, 2015.
- F. Previtali, A. Bordallo, L. Iocchi, and S. Ramamoorthy, “Predicting future agent motions for dynamic environments,” in 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 94–99, IEEE, 2016.
- A. Kuefler, J. Morton, T. Wheeler, and M. Kochenderfer, “Imitating driver behavior with generative adversarial networks,” in 2017 IEEE Intelligent Vehicles Symposium (IV), pp. 204–211, IEEE, 2017.
- Y. Li, J. Song, and S. Ermon, “Infogail: Interpretable imitation learning from visual demonstrations,” Advances in Neural Information Processing Systems, vol. 30, 2017.
- I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” Advances in neural information processing systems, vol. 27, 2014.
- C. Zhang, Z. Ni, and C. Berger, “Spatial-temporal-spectral lstm: A transferable model for pedestrian trajectory prediction,” IEEE Transactions on Intelligent Vehicles, 2023.
- S. Li, Y. Zhou, J. Yi, and J. Gall, “Spatial-temporal consistency network for low-latency trajectory forecasting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1940–1949, 2021.
- Y. Chai, B. Sapp, M. Bansal, and D. Anguelov, “Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction,” arXiv preprint arXiv:1910.05449, 2019.
- I. Bae, J.-H. Park, and H.-G. Jeon, “Learning pedestrian group representations for multi-modal trajectory prediction,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII, pp. 270–289, Springer, 2022.
- K. Messaoud, I. Yahiaoui, A. Verroust-Blondet, and F. Nashashibi, “Attention based vehicle trajectory prediction,” IEEE Transactions on Intelligent Vehicles, vol. 6, no. 1, pp. 175–185, 2020.
- K. Zhang, L. Zhao, C. Dong, L. Wu, and L. Zheng, “Ai-tp: Attention-based interaction-aware trajectory prediction for autonomous driving,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 1, pp. 73–83, 2022.
- Z. Li, Y. Wang, and Z. Zuo, “Interaction-aware prediction for cut-in trajectories with limited observable neighboring vehicles,” IEEE Transactions on Intelligent Vehicles, 2023.
- C. Xu, M. Li, Z. Ni, Y. Zhang, and S. Chen, “Groupnet: Multiscale hypergraph neural networks for trajectory prediction with relational reasoning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6498–6507, 2022.
- L. Shi, L. Wang, C. Long, S. Zhou, M. Zhou, Z. Niu, and G. Hua, “Sgcn: Sparse graph convolution network for pedestrian trajectory prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8994–9003, 2021.
- H. Song, D. Luan, W. Ding, M. Y. Wang, and Q. Chen, “Learning to predict vehicle trajectories with model-based planning,” in Conference on Robot Learning, pp. 1035–1045, PMLR, 2022.
- Y. Chen, B. Ivanovic, and M. Pavone, “Scept: Scene-consistent, policy-based trajectory predictions for planning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17103–17112, 2022.
- T. Phan-Minh, E. C. Grigore, F. A. Boulton, O. Beijbom, and E. M. Wolff, “Covernet: Multimodal behavior prediction using trajectory sets,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14074–14083, 2020.
- L. F. Chiara, P. Coscia, S. Das, S. Calderara, R. Cucchiara, and L. Ballan, “Goal-driven self-attentive recurrent networks for trajectory prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2518–2527, 2022.
- J. Wang, T. Ye, Z. Gu, and J. Chen, “Ltp: Lane-based trajectory prediction for autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17134–17142, 2022.
- J. Gu, C. Sun, and H. Zhao, “Densetnt: End-to-end trajectory prediction from dense goal sets,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15303–15312, 2021.
- W. Mao, C. Xu, Q. Zhu, S. Chen, and Y. Wang, “Leapfrog diffusion model for stochastic trajectory prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5517–5526, 2023.
- X. L. Li, J. Thickstun, I. Gulrajani, P. Liang, and T. B. Hashimoto, “Diffusion-lm improves controllable text generation,” arXiv preprint arXiv:2205.14217, 2022.
- Y. Tashiro, J. Song, Y. Song, and S. Ermon, “Csdi: Conditional score-based diffusion models for probabilistic time series imputation,” Advances in Neural Information Processing Systems, vol. 34, pp. 24804–24816, 2021.
- J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in International Conference on Machine Learning, pp. 2256–2265, PMLR, 2015.
- L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, Y. Shao, W. Zhang, B. Cui, and M.-H. Yang, “Diffusion models: A comprehensive survey of methods and applications,” arXiv preprint arXiv:2209.00796, 2022.
- D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” in International conference on machine learning, pp. 1278–1286, PMLR, 2014.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
- H. Cao, C. Tan, Z. Gao, G. Chen, P.-A. Heng, and S. Z. Li, “A survey on generative diffusion model,” arXiv preprint arXiv:2209.02646, 2022.
- C. Luo, “Understanding diffusion models: A unified perspective,” arXiv preprint arXiv:2208.11970, 2022.
- I. Bae, J.-H. Park, and H.-G. Jeon, “Non-probability sampling network for stochastic human trajectory prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6477–6487, 2022.