PIP-Net: Pedestrian Intention Prediction in the Wild
Abstract: Accurate pedestrian intention prediction (PIP) by Autonomous Vehicles (AVs) is one of the current research challenges in this field. In this article, we introduce PIP-Net, a novel framework designed to predict pedestrian crossing intentions by AVs in real-world urban scenarios. We offer two variants of PIP-Net designed for different camera mounts and setups. Leveraging both kinematic data and spatial features from the driving scene, the proposed model employs a recurrent and temporal attention-based solution, outperforming state-of-the-art performance. To enhance the visual representation of road users and their proximity to the ego vehicle, we introduce a categorical depth feature map, combined with a local motion flow feature, providing rich insights into the scene dynamics. Additionally, we explore the impact of expanding the camera's field of view, from one to three cameras surrounding the ego vehicle, leading to enhancement in the model's contextual perception. Depending on the traffic scenario and road environment, the model excels in predicting pedestrian crossing intentions up to 4 seconds in advance which is a breakthrough in current research studies in pedestrian intention prediction. Finally, for the first time, we present the Urban-PIP dataset, a customised pedestrian intention prediction dataset, with multi-camera annotations in real-world automated driving scenarios.
- G. Yannis, D. Nikolaou, A. Laiou, Y. A. Stürmer, I. Buttler, and D. Jankowska-Karpa, “Vulnerable road users: Cross-cultural perspectives on performance and attitudes,” IATSS research, vol. 44, no. 3, pp. 220–229, 2020.
- N. Sharma, C. Dhiman, and S. Indu, “Pedestrian intention prediction for autonomous vehicles: A comprehensive survey,” Neurocomputing, 2022.
- A. Najmi, T. Waller, M. Memarpour, D. Nair, and T. H. Rashidi, “A human behaviour model and its implications in the transport context,” Transportation research interdisciplinary perspectives, vol. 18, p. 100800, 2023.
- Z. Zhou, Y. Liu, B. Liu, M. Ouyang, and R. Tang, “Pedestrian crossing intention prediction model considering social interaction between multi-pedestrians and multi-vehicles,” Transportation Research Record, p. 03611981231187643, 2023.
- A. H. Kalantari, Y. Yang, J. G. de Pedro, Y. M. Lee, A. Horrobin, A. Solernou, C. Holmes, N. Merat, and G. Markkula, “Who goes first? a distributed simulator study of vehicle–pedestrian interaction,” Accident Analysis & Prevention, vol. 186, p. 107050, 2023.
- B. Yang, W. Zhan, P. Wang, C. Chan, Y. Cai, and N. Wang, “Crossing or not? context-based recognition of pedestrian crossing intention in the urban environment,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 5338–5349, 2021.
- K. Muhammad, T. Hussain, H. Ullah, J. Del Ser, M. Rezaei, N. Kumar, M. Hijji, P. Bellavista, and V. H. C. de Albuquerque, “Vision-based semantic segmentation in scene understanding for autonomous driving: Recent achievements, challenges, and outlooks,” IEEE Transactions on Intelligent Transportation Systems, 2022.
- L. Chen, S. Lin, X. Lu, D. Cao, H. Wu, C. Guo, C. Liu, and F.-Y. Wang, “Deep neural network based vehicle and pedestrian detection for autonomous driving: A survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 6, pp. 3234–3246, 2021.
- M. Gulzar, Y. Muhammad, and N. Muhammad, “A survey on motion prediction of pedestrians and vehicles for autonomous driving,” IEEE Access, vol. 9, pp. 137 957–137 969, 2021.
- J.-S. Ham, K. Bae, and J. Moon, “MCIP: Multi-stream network for pedestrian crossing intention prediction,” in Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part I. Springer, 2022, pp. 663–679.
- J.-S. Ham, D. H. Kim, N. Jung, and J. Moon, “CIPF: Crossing intention prediction network based on feature fusion modules for improving pedestrian safety,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3665–3674.
- R. Ni, B. Yang, Z. Wei, H. Hu, and C. Yang, “Pedestrians crossing intention anticipation based on dual-channel action recognition and hierarchical environmental context,” IET Intelligent Transport Systems, vol. 17, no. 2, pp. 255–269, 2023.
- T. Zhang, X. Chen, Y. Wang, Y. Wang, and H. Zhao, “Mutr3d: A multi-camera tracking framework via 3d-to-2d queries,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4537–4546.
- A. Rasouli, I. Kotseruba, and J. K. Tsotsos, “Are they going to cross? a benchmark dataset and baseline for pedestrian crosswalk behavior,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 206–213.
- A. Rasouli, I. Kotseruba, T. Kunic, and J. K. Tsotsos, “PIE: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6262–6271.
- B. Liu, E. Adeli, Z. Cao, K.-H. Lee, A. Shenoi, A. Gaidon, and J. C. Niebles, “Spatiotemporal relationship reasoning for pedestrian intent prediction,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3485–3492, 2020.
- I. Kotseruba, A. Rasouli, and J. K. Tsotsos, “Benchmark for evaluating pedestrian action prediction,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 1258–1268.
- F. Schneemann and P. Heinemann, “Context-based detection of pedestrian crossing intention for autonomous driving in urban environments,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016, pp. 2243–2248.
- J. Mei, A. Z. Zhu, X. Yan, H. Yan, S. Qiao, L.-C. Chen, and H. Kretzschmar, “Waymo open dataset: Panoramic video panoptic segmentation,” in European Conference on Computer Vision. Springer, 2022, pp. 53–72.
- J. Gesnouin, S. Pechberti, B. Stanciulescu, and F. Moutarde, “Assessing cross-dataset generalization of pedestrian crossing predictors,” in 2022 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2022, pp. 419–426.
- H. Razali, T. Mordan, and A. Alahi, “Pedestrian intention prediction: A convolutional bottom-up multi-task approach,” Transportation research part C: emerging technologies, vol. 130, p. 103259, 2021.
- K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1724–1734.
- A. Rasouli, I. Kotseruba, and J. K. Tsotsos, “Pedestrian action anticipation using contextual feature fusion in stacked rnns,” in British Machine Vision Conference, 2020.
- A. Bhattacharyya, M. Fritz, and B. Schiele, “Long-term on-board prediction of people in traffic scenes under uncertainty,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4194–4202.
- I. Kotseruba, A. Rasouli, and J. K. Tsotsos, “Do they want to cross? understanding pedestrian intention for behavior prediction,” in 2020 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2020, pp. 1688–1693.
- J. Lorenzo, I. Parra, F. Wirth, C. Stiller, D. F. Llorca, and M. A. Sotelo, “RNN-based pedestrian crossing prediction using activity and pose-related features,” in 2020 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2020, pp. 1801–1806.
- A. Singh and U. Suddamalla, “Multi-input fusion for practical pedestrian intention prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2304–2311.
- K. Saleh, M. Hossny, and S. Nahavandi, “Real-time intent prediction of pedestrians for autonomous ground vehicles via spatio-temporal densenet,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 9704–9710.
- ——, “Spatio-temporal densenet for real-time intent prediction of pedestrians in urban traffic environments,” Neurocomputing, vol. 386, pp. 317–324, 2020.
- J. Lorenzo, I. Parra, R. Izquierdo, A. L. Ballardini, Á. Hernández-Saz, D. F. Llorca, and M. Á. Sotelo, “CAPformer: Pedestrian crossing action prediction using transformer,” Sensors (Basel, Switzerland), vol. 21, 2021.
- S. Neogi, M. Hoy, K. Dang, H. Yu, and J. Dauwels, “Context model for pedestrian intention prediction using factored latent-dynamic conditional random fields,” IEEE transactions on intelligent transportation systems, vol. 22, no. 11, pp. 6821–6832, 2020.
- D. Zhang, F. Shi, Y. Meng, Y. Xu, X. Xiao, and W. Li, “Pedestrian intention prediction via depth augmented scene restoration,” in 2021 5th CAA International Conference on Vehicular Control and Intelligence (CVCI). IEEE, 2021, pp. 1–6.
- Z. Fang and A. M. López, “Is the pedestrian going to cross? answering by 2d pose estimation,” in 2018 IEEE intelligent vehicles symposium (IV). IEEE, 2018, pp. 1271–1276.
- Z. Fang and A. M. López, “Intention recognition of pedestrians and cyclists by 2d pose estimation,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 11, pp. 4773–4783, 2020.
- P. R. G. Cadena, Y. Qian, C. Wang, and M. Yang, “Pedestrian graph+: A fast pedestrian crossing prediction model based on graph convolutional networks,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 11, pp. 21 050–21 061, 2022.
- D. Yang, H. Zhang, E. Yurtsever, K. A. Redmill, and Ü. Özgüner, “Predicting pedestrian crossing intention with feature fusion and spatio-temporal attention,” IEEE Transactions on Intelligent Vehicles, vol. 7, no. 2, pp. 221–230, 2022.
- M. Azarmi, M. Rezaei, T. Hussain, and C. Qian, “Local and global contextual features fusion for pedestrian intention prediction,” in Artificial Intelligence and Smart Vehicles, 2023, pp. 1–13.
- H. Zhao, J. Jia, and V. Koltun, “Exploring self-attention for image recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10 076–10 085.
- M. Rezaei, M. Azarmi, and F. M. P. Mir, “3D-Net: Monocular 3D object recognition for traffic monitoring,” Expert Systems with Applications, vol. 227, p. 120253, 2023.
- D. Maji, S. Nagori, M. Mathew, and D. Poddar, “YOLO-Pose: Enhancing YOLO for multi person pose estimation using object keypoint similarity loss,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2637–2646.
- E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2462–2470.
- Y. Zhou, H. Zhang, H. Lee, S. Sun, P. Li, Y. Zhu, B. Yoo, X. Qi, and J.-J. Han, “Slot-vps: Object-centric representation learning for video panoptic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3093–3103.
- J. Watson, O. Mac Aodha, V. Prisacariu, G. Brostow, and M. Firman, “The temporal opportunist: Self-supervised multi-frame monocular depth,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1164–1174.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.