Papers
Topics
Authors
Recent
2000 character limit reached

Towards more realistic human motion prediction with attention to motion coordination (2404.03584v1)

Published 4 Apr 2024 in cs.CV

Abstract: Joint relation modeling is a curial component in human motion prediction. Most existing methods rely on skeletal-based graphs to build the joint relations, where local interactive relations between joint pairs are well learned. However, the motion coordination, a global joint relation reflecting the simultaneous cooperation of all joints, is usually weakened because it is learned from part to whole progressively and asynchronously. Thus, the final predicted motions usually appear unrealistic. To tackle this issue, we learn a medium, called coordination attractor (CA), from the spatiotemporal features of motion to characterize the global motion features, which is subsequently used to build new relative joint relations. Through the CA, all joints are related simultaneously, and thus the motion coordination of all joints can be better learned. Based on this, we further propose a novel joint relation modeling module, Comprehensive Joint Relation Extractor (CJRE), to combine this motion coordination with the local interactions between joint pairs in a unified manner. Additionally, we also present a Multi-timescale Dynamics Extractor (MTDE) to extract enriched dynamics from the raw position information for effective prediction. Extensive experiments show that the proposed framework outperforms state-of-the-art methods in both short- and long-term predictions on H3.6M, CMU-Mocap, and 3DPW.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Y. Ji, Y. Yang, F. Shen, H. Shen, and X. Li, “A survey of human action analysis in HRI applications,” IEEE Trans. Circuit Syst. Video Technol., vol. 30, no. 7, pp. 2114–2128, 2020.
  2. L. Chen, J. Lu, Z. Song, and J. Zhou, “Recurrent semantic preserving generation for action prediction,” IEEE Trans. Circuit Syst. Video Technol., vol. 31, no. 1, pp. 231–245, 2021.
  3. H. S. Koppula and A. Saxena, “Anticipating human activities for reactive robotic response,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 1, pp. 14–29, 2016.
  4. J. Bütepage, H. Kjellström, and D. Kragic, “Anticipating many futures: Online human motion prediction and generation for human-robot interaction,” in IEEE Int. Conf. Robot. Automat, 2018, pp. 1–9.
  5. B. Paden, M. Čáp, S. Z. Yong, D. Yershov, and E. Frazzoli, “A survey of motion planning and control techniques for self-driving urban vehicles,” IEEE Trans. Intell. Veh., vol. 1, no. 1, pp. 33–55, 2016.
  6. I. Hasan, F. Setti, T. Tsesmelis, V. Belagiannis, S. Amin, A. D. Bue, M. Cristani, and F. Galasso, “Forecasting people trajectories and head poses by jointly reasoningon tracklets and vislets,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 4, pp. 1267–1278, 2021.
  7. A. Bhattacharyya, M. Fritz, and B. Schiele, “Long-term on-board prediction of people in traffic scenes under uncertainty,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 4194–4202.
  8. J. M. Wang, D. J. Fleet, and A. Hertzmann, “Gaussian process dynamical models for human motion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 2, pp. 283–298, 2008.
  9. A. M. Lehrmann, P. V. Gehler, and S. Nowozin, “Efficient nonlinear markov models for human motion,” in IEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 1314–1321.
  10. A. Lehrmann, P. V. Gehler, and S. Nowozin, “Efficient nonlinear markov models for human motion,” in IEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 1314–1321.
  11. Q. Men, E. S. L. Ho, H. P. H. Shum, and H. Leung, “A quadruple diffusion convolutional recurrent network for human motion prediction,” IEEE Transactions on Circuits and Systems for Video Technology, 2020.
  12. K. Fragkiadaki, S. Levine, P. Felsen, and J. Malik, “Recurrent network models for human dynamics,” in Int. Conf. Comput. Vis., 2015, pp. 4346–4354.
  13. J. Martinez, M. J. Black, and J. Romero, “On human motion prediction using recurrent neural networks,” in IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 4674–4683.
  14. Y. Liang, X. Yu, X. Liang, and J. Moura, “Adversarial geometry-aware human motion prediction,” in Eur. Conf. Comput. Vis., 2018, pp. 823–842.
  15. A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, “Structural-rnn: Deep learning on spatio-temporal graphs,” in IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 5308–5317.
  16. E. Aksan, M. Kaufmann, and O. Hilliges, “Structured prediction helps 3d human motion modelling,” in Int. Conf. Comput. Vis., 2019, pp. 7143–7152.
  17. C. Li, Z. Zhang, W. S. Lee, and G. H. Lee, “Convolutional sequence to sequence model for human dynamics,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 5226–5234.
  18. X. Liu, J. Yin, J. Liu, P. Ding, J. Liu, and H. Liub, “Trajectorycnn: a new spatio-temporal feature learning network for human motion prediction,” IEEE Trans. Circuit Syst. Video Technol., pp. 1–1, 2020.
  19. M. Li, S. Chen, Y. Zhao, Y. Zhang, Y. Wang, and Q. Tian, “Dynamic multiscale graph neural networks for 3d skeleton based human motion prediction,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 211–220.
  20. Q. Cui, H. Sun, and F. Yang, “Learning dynamic relationships for 3d human motion prediction,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 6519–6527.
  21. W. Mao, M. Liu, M. Salzmann, and H. Li, “Learning trajectory dependencies for human motion prediction,” in Int. Conf. Comput. Vis., 2019, pp. 9488–9496.
  22. W. Mao, M. Liu, and M. Salzmann, “History repeats itself: Human motion prediction via motion attention,” in Eur. Conf. Comput. Vis., 2020, pp. 474–489.
  23. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 770–778.
  24. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI, vol. 9351, 2015, pp. 234–241.
  25. C. Li, Q. Zhong, D. Xie, and S. Pu, “Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation,” in IJCAI, 2018, pp. 786–792.
  26. K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Adv. Neural Inform. Process. Syst., 2014, pp. 568–576.
  27. X. Guo and J. Choi, “Human motion prediction via learning local structure representations and temporal dependencies,” in AAAI Conf. Artif. Intell., 2019, pp. 2580–2587.
  28. H. Chiu, E. Adeli, B. Wang, D. Huang, and J. Niebles, “Action-agnostic human pose forecasting,” in IEEE Winter Conf. Appl. Comput. Vis., 2019, pp. 1423–1432.
  29. D. Pavllo, C. Feichtenhofer, M. Auli, and D.Grangier, “Modeling human motion with quaternion-based neural networks,” Int. J. Comput. Vis., vol. 128, no. 4, pp. 855–872, 2020.
  30. A. H. Ruiz, J. Gall, and F. Moreno, “Human motion prediction via spatio-temporal inpainting,” in Int. Conf. Comput. Vis., 2019, pp. 7133–7142.
  31. Y. Cai, L. Huang, Y. Wang, T. Cham, J. Cai, J. Yuan, J. Liu, X. Yang, Y. Zhu, XShen et al., “Learning progressive joint propagation for human motion prediction,” in Eur. Conf. Comput. Vis., 2020, pp. 226–242.
  32. J. Butepage, M. Black, D. Kragic, and H. Kjellstrom, “Deep representation learning for human motion prediction and classification,” in IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 6158–6166.
  33. B. Wang, E. Adeli, H. Chiu, D. Huang, and J. Niebles, “Imitation learning for human pose prediction,” in Int. Conf. Comput. Vis., 2019, pp. 7124–7133.
  34. Y. Cai, L. Ge, J. Liu, J. Cai, T. Cham, J. Yuan, and N. Thalmann, “Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks,” in Int. Conf. Comput. Vis., 2019, pp. 2272–2281.
  35. S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in AAAI Conf. Artif. Intell., 2018, pp. 7444–7452.
  36. X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 7794–7803.
  37. C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 7, pp. 1325–1339, 2014.
  38. T. Marcard, R. Henschel, M. Black, B. Rosenhahn, and G. Pons-Moll, “Recovering accurate 3d human pose in the wild using imus and a moving camera,” in Eur. Conf. Comput. Vis., 2018, pp. 614–631.
  39. CMU.(2003)Graphics lab motion capture database. [Online]. Available: http://mocap.cs.cmu.edu/
  40. A. Paszke, S.Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVitoand Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
Citations (15)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.