Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Semantics-aware Motion Retargeting with Vision-Language Models (2312.01964v3)

Published 4 Dec 2023 in cs.CV and cs.GR

Abstract: Capturing and preserving motion semantics is essential to motion retargeting between animation characters. However, most of the previous works neglect the semantic information or rely on human-designed joint-level representations. Here, we present a novel Semantics-aware Motion reTargeting (SMT) method with the advantage of vision-LLMs to extract and maintain meaningful motion semantics. We utilize a differentiable module to render 3D motions. Then the high-level motion semantics are incorporated into the motion retargeting process by feeding the vision-LLM with the rendered images and aligning the extracted semantic embeddings. To ensure the preservation of fine-grained motion details and high-level semantics, we adopt a two-stage pipeline consisting of skeleton-aware pre-training and fine-tuning with semantics and geometry constraints. Experimental results show the effectiveness of the proposed method in producing high-quality motion retargeting results while accurately preserving motion semantics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Adobe’s mixamo. https://www.mixamo.com/. Accessed: 2023-02-08.
  2. Skeleton-aware networks for deep motion retargeting. ACM Transactions on Graphics (TOG), 39(4):62–1, 2020.
  3. Scanqa: 3d question answering for spatial scene understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  4. Online motion retargetting. The Journal of Visualization and Computer Animation, 11(5):223–235, 2000.
  5. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  6. Michael Gleicher. Retargetting motion to new characters. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pages 33–42, 1998.
  7. Action2motion: Conditioned generation of 3d human motions. In Proceedings of the 28th ACM International Conference on Multimedia, pages 2021–2029, 2020.
  8. Generating diverse and natural 3d human motions from text. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5152–5161, 2022.
  9. Pose-aware attention network for flexible motion retargeting by body part. IEEE Transactions on Visualization and Computer Graphics, pages 1–17, 2023.
  10. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339, 2014.
  11. A hierarchical approach to interactive motion editing for human-like figures. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pages 39–48, 1999.
  12. Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 165–172, 2000.
  13. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023a.
  14. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. In ICML, 2023b.
  15. Visual semantic reasoning for image-text matching. In ICCV, 2019.
  16. Pmnet: Learning of disentangled pose and movement for unsupervised motion retargeting. In BMVC, page 7, 2019.
  17. Swinbert: End-to-end transformers with sparse attention for video captioning. In CVPR, 2022.
  18. Soft rasterizer: A differentiable renderer for image-based 3d reasoning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7708–7717, 2019.
  19. Accurate 3d hand pose estimation for whole-body 3d human mesh estimation. In Computer Vision and Pattern Recognition Workshop (CVPRW), 2022.
  20. 3d human pose estimation in video with temporal convolutions and semi-supervised training. In Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  21. Physically based motion transformation. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pages 11–20, 1999.
  22. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  23. Motionclip: Exposing human motion generation to clip space. In European Conference on Computer Vision, pages 358–374. Springer, 2022a.
  24. Human motion diffusion model. arXiv preprint arXiv:2209.14916, 2022b.
  25. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  26. Neural kinematic networks for unsupervised motion retargetting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8639–8648, 2018.
  27. Contact-aware retargeting of skinned motion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9720–9729, 2021.
  28. Sat: 2d semantics assisted training for 3d visual grounding. In ICCV, 2021.
  29. Skinned motion retargeting with residual perception of motion semantics & geometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13864–13872, 2023.
  30. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019.
  31. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023a.
  32. 3d-vista: Pre-trained transformer for 3d vision and text alignment. ICCV, 2023b.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com