Pose-aware Attention Network for Flexible Motion Retargeting by Body Part (2306.08006v1)
Abstract: Motion retargeting is a fundamental problem in computer graphics and computer vision. Existing approaches usually have many strict requirements, such as the source-target skeletons needing to have the same number of joints or share the same topology. To tackle this problem, we note that skeletons with different structure may have some common body parts despite the differences in joint numbers. Following this observation, we propose a novel, flexible motion retargeting framework. The key idea of our method is to regard the body part as the basic retargeting unit rather than directly retargeting the whole body motion. To enhance the spatial modeling capability of the motion encoder, we introduce a pose-aware attention network (PAN) in the motion encoding phase. The PAN is pose-aware since it can dynamically predict the joint weights within each body part based on the input pose, and then construct a shared latent space for each body part by feature pooling. Extensive experiments show that our approach can generate better motion retargeting results both qualitatively and quantitatively than state-of-the-art methods. Moreover, we also show that our framework can generate reasonable results even for a more challenging retargeting scenario, like retargeting between bipedal and quadrupedal skeletons because of the body part retargeting strategy and PAN. Our code is publicly available.
- M. Loper, N. Mahmood, and M. J. Black, “Mosh: Motion and shape capture from sparse markers,” ACM Transactions on Graphics (ToG), vol. 33, no. 6, pp. 1–13, 2014.
- S. Kim, M. Sorokin, J. Lee, and S. Ha, “Human motion control of quadrupedal robots using deep reinforcement learning,” arXiv preprint arXiv:2204.13336, 2022.
- D. Rempe, L. J. Guibas, A. Hertzmann, B. Russell, R. Villegas, and J. Yang, “Contact and human dynamics from monocular video,” in European conference on computer vision. Springer, 2020, pp. 71–87.
- Y. Ye, L. Liu, L. Hu, and S. Xia, “Neural3points: Learning to generate physically realistic full-body motion for virtual reality users,” arXiv preprint arXiv:2209.05753, 2022.
- R. Villegas, J. Yang, D. Ceylan, and H. Lee, “Neural kinematic networks for unsupervised motion retargetting,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8639–8648.
- J. Lim, H. J. Chang, and J. Y. Choi, “Pmnet: Learning of disentangled pose and movement for unsupervised motion retargeting.” in BMVC, 2019, p. 136.
- R. Villegas, D. Ceylan, A. Hertzmann, J. Yang, and J. Saito, “Contact-aware retargeting of skinned motion,” arXiv preprint arXiv:2109.07431, 2021.
- M. Gleicher, “Retargetting motion to new characters,” in Proceedings of the 25th annual conference on Computer graphics and interactive techniques, 1998, pp. 33–42.
- C. Kwang-Jin and K. Hyeong-Seok, “On-line motion retargetting,” The Journal of Visualization and Computer Animation, vol. 11, pp. 223–235, 2000.
- J. Lee and S. Y. Shin, “A hierarchical approach to interactive motion editing for human-like figures,” in Proceedings of the 26th annual conference on Computer graphics and interactive techniques, 1999, pp. 39–48.
- K. Aberman, P. Li, D. Lischinski, O. Sorkine-Hornung, D. Cohen-Or, and B. Chen, “Skeleton-aware networks for deep motion retargeting,” ACM Transactions on Graphics (TOG), vol. 39, no. 4, pp. 62–1, 2020.
- D. Holden, T. Komura, and J. Saito, “Phase-functioned neural networks for character control,” ACM Transactions on Graphics (TOG), vol. 36, no. 4, pp. 1–13, 2017.
- H. Zhang, S. Starke, T. Komura, and J. Saito, “Mode-adaptive neural networks for quadruped motion control,” ACM Transactions on Graphics (TOG), vol. 37, no. 4, pp. 1–11, 2018.
- S. Starke, H. Zhang, T. Komura, and J. Saito, “Neural state machine for character-scene interactions.” ACM Trans. Graph., vol. 38, no. 6, pp. 209–1, 2019.
- C. Zhong, L. Hu, Z. Zhang, Y. Ye, and S. Xia, “Spatio-temporal gating-adjacency gcn for human motion prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6447–6456.
- A. S. Inc, “Adobe’s mixamo.” https://www.mixamo.com, 2021, accessed: 2021-04-02.
- C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 7, pp. 1325–1339, 2013.
- F. G. Harvey, M. Yurick, D. Nowrouzezahrai, and C. Pal, “Robust motion in-betweening,” vol. 39, no. 4, 2020.
- C. Hecker, B. Raabe, R. W. Enslow, J. DeWeese, J. Maynard, and K. van Prooijen, “Real-time motion retargeting to highly varied user-created morphologies,” ACM Transactions on Graphics (TOG), vol. 27, no. 3, pp. 1–11, 2008.
- Z. Popović and A. Witkin, “Physically based motion transformation,” in Proceedings of the 26th annual conference on Computer graphics and interactive techniques, 1999, pp. 11–20.
- S. Tak and H.-S. Ko, “A physically-based motion retargeting filter,” ACM Transactions on Graphics (TOG), vol. 24, no. 1, pp. 98–117, 2005.
- K. Yamane, Y. Ariki, and J. Hodgins, “Animating non-humanoid characters with human motion data,” in Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2010, pp. 169–178.
- Y. Seol, C. O’Sullivan, and J. Lee, “Creature features: online motion puppetry for non-human characters,” in Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2013, pp. 213–221.
- S. Uk Kim, H. Jang, and J. Kim, “A variational u-net for motion retargeting,” Computer Animation and Virtual Worlds, vol. 31, no. 4-5, p. e1947, 2020.
- W.-S. Jang, W.-K. Lee, I.-K. Lee, and J. Lee, “Enriching a motion database by analogous combination of partial human motions,” The Visual Computer, vol. 24, no. 4, pp. 271–280, 2008.
- R. Heck, L. Kovar, and M. Gleicher, “Splicing upper-body actions with locomotion,” in Computer Graphics Forum, vol. 25, no. 3. Wiley Online Library, 2006, pp. 459–466.
- W. Ma, S. Xia, J. K. Hodgins, X. Yang, C. Li, and Z. Wang, “Modeling style and variation in human motion,” in Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2010, pp. 21–30.
- D.-K. Jang, S. Park, and S.-H. Lee, “Motion puzzle: Arbitrary motion style transfer by body part,” arXiv preprint arXiv:2202.05274, 2022.
- S. Lee, J. Lee, and J. Lee, “Learning virtual chimeras by dynamic motion reassembly,” ACM Transactions on Graphics (TOG), vol. 41, no. 6, pp. 1–13, 2022.
- M. Abdul-Massih, I. Yoo, and B. Benes, “Motion style retargeting to characters with different morphologies,” in Computer Graphics Forum, vol. 36, no. 6. Wiley Online Library, 2017, pp. 86–99.
- Z. Liao, J. Yang, J. Saito, G. Pons-Moll, and Y. Zhou, “Skeleton-free pose transfer for stylized 3d characters,” arXiv preprint arXiv:2208.00790, 2022.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
- S. Xia, C. Wang, J. Chai, and J. Hodgins, “Realtime style transfer for unlabeled heterogeneous human motion,” ACM Transactions on Graphics (TOG), vol. 34, no. 4, pp. 1–10, 2015.
- S. Starke, Y. Zhao, T. Komura, and K. Zaman, “Local motion phases for learning multi-contact character movements,” ACM Transactions on Graphics, vol. 39, no. 4, 2020.
- S. Starke, Y. Zhao, F. Zinno, and T. Komura, “Neural animation layering for synthesizing martial arts movements,” ACM Transactions on Graphics (TOG), vol. 40, no. 4, pp. 1–16, 2021.
- L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two-stream adaptive graph convolutional networks for skeleton-based action recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12 026–12 035.
- K. Lin, L. Wang, and Z. Liu, “End-to-end human pose and mesh reconstruction with transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1954–1963.
- C. Zheng, S. Zhu, M. Mendieta, T. Yang, C. Chen, and Z. Ding, “3d human pose estimation with spatial and temporal transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11 656–11 665.
- J. Zhang, Z. Tu, J. Yang, Y. Chen, and J. Yuan, “Mixste: Seq2seq mixed spatio-temporal encoder for 3d human pose estimation in video,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 232–13 242.
- L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Decoupled spatial-temporal attention network for skeleton-based action-gesture recognition,” in Proceedings of the Asian Conference on Computer Vision, 2020.
- M. Petrovich, M. J. Black, and G. Varol, “Action-conditioned 3d human motion synthesis with transformer vae,” arXiv preprint arXiv:2104.05670, 2021.
- R. W. Sumner and J. Popović, “Deformation transfer for triangle meshes,” ACM Transactions on graphics (TOG), vol. 23, no. 3, pp. 399–405, 2004.
- I. Baran, D. Vlasic, E. Grinspun, and J. Popović, “Semantic deformation transfer,” in ACM SIGGRAPH 2009 papers, 2009, pp. 1–6.
- U. Celikcan, I. O. Yaz, and T. Capin, “Example-based retargeting of human motion to arbitrary mesh models,” in Computer Graphics Forum, vol. 34, no. 1. Wiley Online Library, 2015, pp. 216–227.
- L. Gao, J. Yang, Y.-L. Qiao, Y.-K. Lai, P. L. Rosin, W. Xu, and S. Xia, “Automatic unpaired shape deformation transfer,” ACM Transactions on Graphics (TOG), vol. 37, no. 6, pp. 1–15, 2018.
- J. Ren, M. Chai, O. J. Woodford, K. Olszewski, and S. Tulyakov, “Flow guided transformable bottleneck networks for motion retargeting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 795–10 805.
- K. Aberman, R. Wu, D. Lischinski, B. Chen, and D. Cohen-Or, “Learning character-agnostic motion for motion retargeting in 2d,” arXiv preprint arXiv:1905.01680, 2019.
- D. Holden, J. Saito, T. Komura, and T. Joyce, “Learning motion manifolds with convolutional autoencoders,” in SIGGRAPH Asia 2015 technical briefs, 2015, pp. 1–4.
- D. Holden, J. Saito, and T. Komura, “A deep learning framework for character motion synthesis and editing,” ACM Transactions on Graphics (TOG), vol. 35, no. 4, pp. 1–11, 2016.
- D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” Computer Science, 2014.
- C. Guo, X. Zuo, S. Wang, S. Zou, and L. Cheng, “Action2motion: Conditioned generation of 3d human motions,” 2020.
- S. Starke, I. Mason, and T. Komura, “Deepphase: Periodic autoencoders for learning motion phase manifolds,” ACM Transactions on Graphics (TOG), vol. 41, no. 4, pp. 1–13, 2022.