Social-Transmotion: Promptable Human Trajectory Prediction (2312.16168v3)
Abstract: Accurate human trajectory prediction is crucial for applications such as autonomous vehicles, robotics, and surveillance systems. Yet, existing models often fail to fully leverage the non-verbal social cues human subconsciously communicate when navigating the space. To address this, we introduce Social-Transmotion, a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior. We translate the idea of a prompt from NLP to the task of human trajectory prediction, where a prompt can be a sequence of x-y coordinates on the ground, bounding boxes in the image plane, or body pose keypoints in either 2D or 3D. This, in turn, augments trajectory data, leading to enhanced human trajectory prediction. Using masking technique, our model exhibits flexibility and adaptability by capturing spatiotemporal interactions between agents based on the available visual cues. We delve into the merits of using 2D versus 3D poses, and a limited set of poses. Additionally, we investigate the spatial and temporal attention map to identify which keypoints and time-steps in the sequence are vital for optimizing human trajectory prediction. Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY. The code is publicly available: https://github.com/vita-epfl/social-transmotion.
- Prediction, cognition and the brain. Frontiers in Human Neuroscience, 4:25, 2010.
- Crowds by example. In Computer graphics forum, volume 26, pages 655–664. Wiley Online Library, 2007.
- A shared representation for photorealistic driving simulators. IEEE Transactions on Intelligent Transportation Systems, 2021.
- Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA), pages 6015–6022. IEEE, 2019.
- Are socially-aware trajectory prediction models really socially-aware? arXiv preprint arXiv:2108.10879, 2021.
- A cooperative car-following/emergency braking system with prediction-based pedestrian avoidance capabilities. IEEE Transactions on Intelligent Transportation Systems, 20(5):1837–1846, 2018.
- Porca: Modeling and planning for autonomous driving among many pedestrians. IEEE Robotics and Automation Letters, 3(4):3418–3425, 2018.
- Social force model for pedestrian dynamics. Physical review E, 51(5):4282, 1995.
- Bayesian intention inference for trajectory prediction with an unknown goal destination. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5817–5823. IEEE, 2015.
- Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 961–971, 2016.
- Social gan: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2255–2264, 2018.
- Human trajectory forecasting in crowds: A deep learning perspective. IEEE Transactions on Intelligent Transportation Systems, 2021.
- Transformer networks for trajectory forecasting. In 2020 25th international conference on pattern recognition (ICPR), pages 10335–10342. IEEE, 2021.
- Dag-net: Double attentive graph neural network for trajectory forecasting. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 2551–2558. IEEE, 2021.
- Human trajectory prediction with momentary observation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6467–6476, 2022.
- Sr-lstm: State refinement for lstm towards pedestrian trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12085–12094, 2019.
- From goals, waypoints & paths to long term human trajectory forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15233–15242, 2021.
- Euro-pvi: Pedestrian vehicle interactions in dense urban centers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6408–6417, 2021.
- Learning the pedestrian-vehicle interaction for pedestrian trajectory prediction. In 2022 8th International Conference on Control, Automation and Robotics (ICCAR), pages 230–236. IEEE, 2022.
- Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pages 683–700. Springer, 2020.
- Social ways: Learning multi-modal distributions of pedestrian trajectories with gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
- Collaborative motion prediction via neural motion message passing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6319–6328, 2020.
- Stgat: Modeling spatial-temporal interactions for human trajectory prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6272–6281, 2019.
- Stochastic trajectory prediction via motion indeterminacy diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17113–17122, June 2022.
- Attention is all you need. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), pages 6000–6010, 2017.
- Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In Proceedings of the European Conference on Computer Vision (ECCV), pages 507–523, 2020.
- Graph-based spatial transformer with memory replay for multi-future pedestrian trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2231–2241, 2022.
- Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9813–9823, 2021.
- Latent variable sequential set transformers for joint multi-agent motion prediction. In International Conference on Learning Representations, 2022.
- Eqmotion: Equivariant multi-agent motion prediction with invariant interaction reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1410–1420, 2023.
- Learning to detect and track visible and occluded body joints in a virtual world. In Proceedings of the European conference on computer vision (ECCV), pages 430–446, 2018.
- Jrdb: A dataset and benchmark of egocentric robot visual perception of humans in built environments. TPAMI, 2021.
- Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339, jul 2014.
- Pedestrian intention prediction: A multi-task perspective. arXiv preprint arXiv:2010.10270, 2020.
- Pedestrian 3d bounding box prediction. arXiv preprint arXiv:2206.14195, 2022.
- Peeking into the future: Predicting future person activities and locations in videos. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5725–5734, 2019.
- Openpose: Realtime multi-person 2d pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
- Future person localization in first-person videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7593–7602, 2018.
- Pedestrian trajectory prediction in heterogeneous traffic using pose keypoints-based convolutional encoder-decoder network. IEEE Transactions on Circuits and Systems for Video Technology, 31(5):1764–1775, 2020.
- Canonpose: Self-supervised monocular 3d human pose estimation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13294–13304, 2021.
- Learning decoupled representations for human pose forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, pages 2294–2303, 2021.
- A generic diffusion-based approach for 3d human pose prediction in the wild. In IEEE International Conference on Robotics and Automation (ICRA), 2023.
- Toward reliable human pose forecasting with uncertainty, 2023.
- Tessetrack: End-to-end learnable multi-person articulated 3d pose tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15190–15200, 2021.
- Pose and semantic map based probabilistic forecast of vulnerable road users trajectories. IEEE Transactions on Intelligent Vehicles, 2022.
- Forecasting people trajectories and head poses by jointly reasoning on tracklets and vislets. IEEE transactions on pattern analysis and machine intelligence, 43(4):1267–1278, 2019.
- You’ll never walk alone: Modeling social behavior for multi-target tracking. In 2009 IEEE 12th international conference on computer vision, pages 261–268. IEEE, 2009.
- Remember intentions: Retrospective-memory-based trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6488–6497, 2022.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Saeed Saadatnejad (13 papers)
- Yang Gao (761 papers)
- Kaouther Messaoud (5 papers)
- Alexandre Alahi (100 papers)