FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing (2312.15004v1)

Published 22 Dec 2023 in cs.CV

Abstract: Text-driven motion generation has achieved substantial progress with the emergence of diffusion models. However, existing methods still struggle to generate complex motion sequences that correspond to fine-grained descriptions, depicting detailed and accurate spatio-temporal actions. This lack of fine controllability limits the usage of motion generation to a larger audience. To tackle these challenges, we present FineMoGen, a diffusion-based motion generation and editing framework that can synthesize fine-grained motions, with spatial-temporal composition to the user instructions. Specifically, FineMoGen builds upon diffusion model with a novel transformer architecture dubbed Spatio-Temporal Mixture Attention (SAMI). SAMI optimizes the generation of the global attention template from two perspectives: 1) explicitly modeling the constraints of spatio-temporal composition; and 2) utilizing sparsely-activated mixture-of-experts to adaptively extract fine-grained features. To facilitate a large-scale study on this new fine-grained motion generation task, we contribute the HuMMan-MoGen dataset, which consists of 2,968 videos and 102,336 fine-grained spatio-temporal descriptions. Extensive experiments validate that FineMoGen exhibits superior motion generation quality over state-of-the-art methods. Notably, FineMoGen further enables zero-shot motion editing capabilities with the aid of modern LLMs (LLM), which faithfully manipulates motion sequences with fine-grained instructions. Project Page: https://mingyuan-zhang.github.io/projects/FineMoGen.html

References (21)

Authors (6)

Mingyuan Zhang (41 papers)
Huirong Li (2 papers)
Zhongang Cai (50 papers)
Jiawei Ren (33 papers)
Lei Yang (372 papers)
Ziwei Liu (368 papers)

Citations (26)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

FineMoGen

Tweets

https://twitter.com/1724894570117517312/status/1741087830154379496

https://twitter.com/gregismotion/status/1745838326903431393

FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing (2312.15004v1)

Summary

Related Papers

GitHub

Tweets