Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AdvMT: Adversarial Motion Transformer for Long-term Human Motion Prediction (2401.05018v2)

Published 10 Jan 2024 in cs.CV

Abstract: To achieve seamless collaboration between robots and humans in a shared environment, accurately predicting future human movements is essential. Human motion prediction has traditionally been approached as a sequence prediction problem, leveraging historical human motion data to estimate future poses. Beginning with vanilla recurrent networks, the research community has investigated a variety of methods for learning human motion dynamics, encompassing graph-based and generative approaches. Despite these efforts, achieving accurate long-term predictions continues to be a significant challenge. In this regard, we present the Adversarial Motion Transformer (AdvMT), a novel model that integrates a transformer-based motion encoder and a temporal continuity discriminator. This combination effectively captures spatial and temporal dependencies simultaneously within frames. With adversarial training, our method effectively reduces the unwanted artifacts in predictions, thereby ensuring the learning of more realistic and fluid human motions. The evaluation results indicate that AdvMT greatly enhances the accuracy of long-term predictions while also delivering robust short-term predictions

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. K. Fragkiadaki, S. Levine, P. Felsen, and J. Malik, “Recurrent network models for human dynamics,” pp. 4346–4354, 12 2015.
  2. J. Martinez, M. J. Black, and J. Romero, “On human motion prediction using recurrent neural networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (Los Alamitos, CA, USA), pp. 4674–4683, IEEE Computer Society, jul 2017.
  3. L.-Y. Gui, Y.-X. Wang, X. Liang, and J. M. Moura, “Adversarial geometry-aware human motion prediction,” in Proceedings of the european conference on computer vision (ECCV), pp. 786–803, 2018.
  4. D. Pavllo, D. Grangier, and M. Auli, “Quaternet: A quaternion-based recurrent model for human motion,” 2018.
  5. C. Li, Z. Zhang, W. S. Lee, and G. H. Lee, “Convolutional sequence to sequence model for human dynamics,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5226–5234, 2018.
  6. X. Liu, J. Yin, J. Liu, P. Ding, J. Liu, and H. Liu, “Trajectorycnn: a new spatio-temporal feature learning network for human motion prediction,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 6, pp. 2133–2146, 2020.
  7. W. Mao, M. Liu, M. Salzmann, and H. Li, “Learning trajectory dependencies for human motion prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9489–9497, 2019.
  8. W. Mao, M. Liu, and M. Salzmann, “History repeats itself: Human motion prediction via motion attention,” in European Conference on Computer Vision, pp. 474–489, Springer, 2020.
  9. E. Barsoum, J. Kender, and Z. Liu, “Hp-gan: Probabilistic 3d human motion prediction via gan,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 1418–1427, 2018.
  10. J. N. Kundu, M. Gor, and R. V. Babu, “Bihmp-gan: Bidirectional 3d human motion prediction gan,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 8553–8560, 2019.
  11. B. Chopin, N. Otberdout, M. Daoudi, and A. Bartolo, “Human motion prediction using manifold-aware wasserstein gan,” in 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pp. 1–8, IEEE, 2021.
  12. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), vol. 30, Curran Associates, Inc., 2017.
  13. Y. Cai, L. Huang, Y. Wang, T.-J. Cham, J. Cai, J. Yuan, J. Liu, X. Yang, Y. Zhu, X. Shen, et al., “Learning progressive joint propagation for human motion prediction,” in European Conference on Computer Vision, pp. 226–242, Springer, 2020.
  14. Y. Li, Z. Wang, X. Yang, M. Wang, S. I. Poiana, E. Chaudhry, and J. Zhang, “Efficient convolutional hierarchical autoencoder for human motion prediction,” The Visual Computer, vol. 35, pp. 1143–1156, 2019.
  15. Q. Cui, H. Sun, and F. Yang, “Learning dynamic relationships for 3d human motion prediction,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6519–6527, 2020.
  16. Y. Tang, L. Ma, W. Liu, and W. Zheng, “Long-term human motion prediction by modeling motion context and enhancing motion dynamic,” arXiv preprint arXiv:1805.02513, 2018.
  17. J. Xu, X. Lan, J. Li, X. Chen, and N. Zheng, “Ean: Error attenuation network for long-term human motion prediction,” in 2019 2nd China Symposium on Cognitive Computing and Hybrid Intelligence (CCHI), pp. 178–183, IEEE, 2019.
  18. M. Zhao, H. Tang, P. Xie, S. Dai, N. Sebe, and W. Wang, “Bidirectional transformer gan for long-term human motion prediction,” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 19, no. 5, pp. 1–19, 2023.
  19. W. Cao, S. Li, and J. Zhong, “A dual attention model based on probabilistically mask for 3d human motion prediction,” Neurocomputing, vol. 493, pp. 106–118, 2022.
  20. W. Guo, Y. Du, X. Shen, V. Lepetit, X. Alameda-Pineda, and F. Moreno-Noguer, “Back to mlp: A simple baseline for human motion prediction,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4809–4819, 2023.
  21. Q. Men, E. S. Ho, H. P. Shum, and H. Leung, “A quadruple diffusion convolutional recurrent network for human motion prediction,” IEEE transactions on circuits and systems for video technology, vol. 31, no. 9, pp. 3417–3432, 2020.
  22. A. Hernandez, J. Gall, and F. Moreno-Noguer, “Human motion prediction via spatio-temporal inpainting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7134–7143, 2019.
  23. X. Chao, Y. Bin, W. Chu, X. Cao, Y. Ge, C. Wang, J. Li, F. Huang, and H. Leung, “Adversarial refinement network for human motion prediction,” in Proceedings of the Asian Conference on Computer Vision, 2020.
  24. K. Lyu, Z. Liu, S. Wu, H. Chen, X. Zhang, and Y. Yin, “Learning human motion prediction via stochastic differential equations,” in Proceedings of the 29th ACM International Conference on Multimedia, pp. 4976–4984, 2021.
  25. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  26. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision, pp. 213–229, Springer, 2020.
  27. R. Strudel, R. Garcia, I. Laptev, and C. Schmid, “Segmenter: Transformer for semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7262–7272, 2021.
  28. E. Aksan, P. Cao, M. Kaufmann, and O. Hilliges, “Attention, please: A spatio-temporal transformer for 3d human motion prediction,” arXiv preprint arXiv:2004.08692, vol. 2, no. 3, p. 5, 2020.
  29. L. Chen, R. Liu, X. Yang, D. Zhou, Q. Zhang, and X. Wei, “Sttg-net: a spatio-temporal network for human motion prediction based on transformer and graph convolution network,” Visual Computing for Industry, Biomedicine, and Art, vol. 5, no. 1, p. 19, 2022.
  30. M. Shi, K. Aberman, A. Aristidou, T. Komura, D. Lischinski, D. Cohen-Or, and B. Chen, “Motionet: 3d human motion reconstruction from monocular video with skeleton consistency,” ACM Transactions on Graphics (TOG), vol. 40, no. 1, pp. 1–15, 2020.
  31. K. Lyu, H. Chen, Z. Liu, B. Zhang, and R. Wang, “3d human motion prediction: A survey,” Neurocomputing, vol. 489, pp. 345–365, 2022.
  32. C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 7, pp. 1325–1339, 2013.
Citations (1)

Summary

We haven't generated a summary for this paper yet.