Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning (2404.05218v1)
Abstract: Human pose forecasting garners attention for its diverse applications. However, challenges in modeling the multi-modal nature of human motion and intricate interactions among agents persist, particularly with longer timescales and more agents. In this paper, we propose an interaction-aware trajectory-conditioned long-term multi-agent human pose forecasting model, utilizing a coarse-to-fine prediction approach: multi-modal global trajectories are initially forecasted, followed by respective local pose forecasts conditioned on each mode. In doing so, our Trajectory2Pose model introduces a graph-based agent-wise interaction module for a reciprocal forecast of local motion-conditioned global trajectory and trajectory-conditioned local pose. Our model effectively handles the multi-modality of human motion and the complexity of long-term multi-agent interactions, improving performance in complex environments. Furthermore, we address the lack of long-term (6s+) multi-agent (5+) datasets by constructing a new dataset from real-world images and 2D annotations, enabling a comprehensive evaluation of our proposed model. State-of-the-art prediction performance on both complex and simpler datasets confirms the generalized effectiveness of our method. The code is available at https://github.com/Jaewoo97/T2P.
- Socially and contextually aware human motion and pose forecasting. IEEE Robotics and Automation Letters, 5(4):6033–6040, 2020.
- Tripod: Human trajectory and pose dynamics forecasting in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13390–13400, 2021.
- Adapt: Efficient multi-agent trajectory prediction with adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8295–8305, 2023.
- Belfusion: Latent diffusion for behavior-driven human motion prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2317–2327, 2023.
- Motionmixer: Mlp-based 3d human body pose forecasting. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pages 791–798. International Joint Conferences on Artificial Intelligence Organization, 2022. Main Track.
- Long-term human motion prediction with scene context. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 387–404. Springer, 2020.
- Global adaptation meets local generalization: Unsupervised domain adaptation for 3d human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14655–14665, 2023.
- Humanmac: Masked motion completion for human motion prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9544–9555, 2023.
- Action-agnostic human pose forecasting. In 2019 IEEE winter conference on applications of computer vision (WACV), pages 1423–1432. IEEE, 2019.
- R-pred: Two-stage motion prediction via tube-query attention-based trajectory refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8525–8535, 2023.
- Tempo: Efficient multi-view pose estimation, tracking, and forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14750–14760, 2023.
- Multi-body depth and camera pose estimation from multiple views. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 17804–17814, 2023.
- CMU-Graphics-Lab. Cmu graphics lab motion capture database. http://mocap.cs.cmu.edu/, 2003.
- Mutual information-based temporal difference learning for human pose estimation in video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17131–17141, 2023a.
- Diffpose: Spatiotemporal diffusion model for video-based human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14861–14872, 2023b.
- Decompose more and aggregate better: Two closer looks at frequency representation learning for human motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6451–6460, 2023.
- Diffpose: Toward more reliable 3d pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13041–13051, 2023.
- Densetnt: End-to-end trajectory prediction from dense goal sets. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 15303–15312, 2021a.
- Densetnt: End-to-end trajectory prediction from dense goal sets. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15303–15312, 2021b.
- Multi-person extreme motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13053–13064, 2022.
- Cipf: Crossing intention prediction network based on feature fusion modules for improving pedestrian safety. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3665–3674, 2023.
- Diffpose: Multi-hypothesis human pose estimation using diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 15977–15987, 2023.
- Diffusion-based generation, optimization, and planning in 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16750–16761, 2023.
- Probabilistic triangulation for uncalibrated multi-view 3d human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14850–14860, 2023a.
- Motiondiffuser: Controllable multi-agent motion prediction using diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9644–9653, 2023b.
- Self-correctable and adaptable inference for generalizable human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5537–5546, 2023.
- Addressing the occlusion problem in multi-camera people tracking with human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5462–5468, 2023.
- Muse-vae: multi-scale vae for environment-aware long term trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2221–2230, 2022.
- Graph-based spatial transformer with memory replay for multi-future pedestrian trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2231–2241, 2022.
- Group pose: A simple baseline for end-to-end multi-person pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 15029–15038, 2023.
- Progressively generating better initial guesses towards next stages for high-quality human motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6437–6446, 2022.
- Motionaug: Augmentation with physical correction for human motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6427–6436, 2022.
- From goals, waypoints & paths to long term human trajectory forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 15233–15242, 2021.
- Learning trajectory dependencies for human motion prediction. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9489–9497, 2019.
- Generating smooth pose sequences for diverse human motion prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13309–13318, 2021.
- Contact-aware human motion forecasting. Advances in Neural Information Processing Systems, 35:7356–7367, 2022a.
- Weakly-supervised action transition learning for stochastic human motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8151–8160, 2022b.
- Leapfrog diffusion model for stochastic trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5517–5526, 2023.
- Single-shot multi-person 3d pose estimation from monocular rgb. In 2018 International Conference on 3D Vision (3DV), pages 120–130, 2018.
- Scene transformer: A unified architecture for predicting future trajectories of multiple agents. In International Conference on Learning Representations, 2022.
- Improving transferability for cross-domain trajectory prediction via neural stochastic differential equation. arXiv preprint arXiv:2312.15906, 2023a.
- Leveraging future relationship reasoning for vehicle trajectory prediction. In The Eleventh International Conference on Learning Representations, 2023b.
- T4p: Test-time training of trajectory prediction via masked autoencoder and actor-specific token memory, 2024.
- Towards robust and smooth 3d multi-person pose estimation from monocular videos in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14772–14782, 2023c.
- Learning decoupled representations for human pose forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2294–2303, 2021.
- Source-free domain adaptive human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4826–4836, 2023a.
- Trajectory-aware body interaction transformer for multi-person pose forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17121–17130, 2023b.
- Psvt: End-to-end multi-person 3d pose and shape estimation with progressive video transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21254–21263, 2023.
- Best practices for 2-body pose forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3613–3623, 2023.
- Prior-guided source-free domain adaptation for human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14996–15006, 2023.
- Fjmp: Factorized joint multi-agent motion prediction over learned directed acyclic interaction graphs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13745–13755, 2023.
- A generic diffusion-based approach for 3d human pose prediction in the wild. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 8246–8253. IEEE, 2023.
- Motron: Multimodal probabilistic human motion forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6457–6466, 2022.
- Robots that can see: Leveraging human pose for trajectory prediction. IEEE Robotics and Automation Letters, 2023.
- Diffusion-based 3d human pose estimation with multi-hypothesis aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14761–14771, 2023.
- Global-to-local modeling for video-based 3d human pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8887–8896, 2023.
- Phasemp: Robust 3d pose estimation via phase-conditioned human motion prior. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14725–14737, 2023.
- Space-time-separable graph convolutional network for pose forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11209–11218, 2021.
- Putting people in their place: Monocular regression of 3d people in depth. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13243–13252, 2022.
- Mixsynthformer: A transformer encoder-like structure with mixed synthetic self-attention for efficient human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14884–14893, 2023.
- 3d human pose estimation with spatio-temporal criss-cross attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4790–4799, 2023.
- Intention-based long-term human motion anticipation. In 2021 International Conference on 3D Vision (3DV), pages 596–605. IEEE, 2021.
- Social diffusion: Long-term multiple human motion anticipation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9601–9611, 2023.
- Umpm benchmark: A multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pages 1264–1269, 2011.
- Somoformer: Multi-person pose forecasting with transformers. arXiv preprint arXiv:2208.14023, 2022.
- Jrdb-pose: A large-scale dataset for multi-person pose estimation and tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4811–4820, 2023.
- Recovering accurate 3d human pose in the wild using imus and a moving camera. In Proceedings of the European Conference on Computer Vision (ECCV), 2018.
- Simple baseline for single human motion forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2260–2265, 2021a.
- Multi-person 3d motion prediction with multi-range transformers. Advances in Neural Information Processing Systems, 34:6036–6049, 2021b.
- Ganet: Goal area network for motion forecasting. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 1609–1615. IEEE, 2023.
- Hdg-ode: A hierarchical continuous-time model for human pose forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14700–14712, 2023.
- Remember intentions: Retrospective-memory-based trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6488–6497, 2022.
- Auxiliary tasks benefit 3d skeleton-based human motion prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9509–9520, 2023a.
- Joint-relation transformer for multi-person motion prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9816–9826, 2023b.
- Stochastic multi-person 3d motion forecasting. In The Eleventh International Conference on Learning Representations, 2023c.
- Uncovering the missing pattern: Unified framework towards trajectory imputation and prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9632–9643, 2023d.
- Bootstrap motion forecasting with self-consistent constraints. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8504–8514, 2023.
- Co-evolution of pose and mesh for 3d human body estimation from video. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14963–14973, 2023.
- Gla-gcn: Global-local adaptive graph convolutional network for 3d human pose estimation from monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8818–8829, 2023.
- Hopfir: Hop-wise graphformer with intragroup joint refinement for 3d human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14985–14995, 2023.
- 3d-aware neural body fitting for occlusion robust 3d human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9399–9410, 2023.
- Where are you heading? dynamic trajectory prediction with expert goal examples. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7629–7638, 2021.
- Poseformerv2: Exploring frequency domain for efficient and robust 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8877–8886, 2023.
- Unlimited neighborhood interaction for heterogeneous trajectory prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 13168–13177, 2021.
- Multi-modal 3d human pose estimation with 2d weak supervision in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4478–4487, 2022.
- Spatio-temporal gating-adjacency gcn for human motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6447–6456, 2022.
- Rethinking pose estimation in crowds: Overcoming the detection information bottleneck and ambiguity. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14689–14699, 2023a.
- Hivt: Hierarchical vector transformer for multi-agent motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8823–8833, 2022.
- Query-centric trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17863–17873, 2023b.
- Ipcc-tp: Utilizing incremental pearson correlation coefficient for joint multi-agent trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5507–5516, 2023.
- Unsupervised online video object segmentation with motion property understanding. IEEE Transactions on Image Processing, 29:237–249, 2019.
- Jaewoo Jeong (11 papers)
- Daehee Park (7 papers)
- Kuk-Jin Yoon (63 papers)