Learning to Generate Diverse Dance Motions with Transformer (2008.08171v1)

Published 18 Aug 2020 in cs.CV and cs.GR

Abstract: With the ongoing pandemic, virtual concerts and live events using digitized performances of musicians are getting traction on massive multiplayer online worlds. However, well choreographed dance movements are extremely complex to animate and would involve an expensive and tedious production process. In addition to the use of complex motion capture systems, it typically requires a collaborative effort between animators, dancers, and choreographers. We introduce a complete system for dance motion synthesis, which can generate complex and highly diverse dance sequences given an input music sequence. As motion capture data is limited for the range of dance motions and styles, we introduce a massive dance motion data set that is created from YouTube videos. We also present a novel two-stream motion transformer generative model, which can generate motion sequences with high flexibility. We also introduce new evaluation metrics for the quality of synthesized dance motions, and demonstrate that our system can outperform state-of-the-art methods. Our system provides high-quality animations suitable for large crowds for virtual concerts and can also be used as reference for professional animation pipelines. Most importantly, we show that vast online videos can be effective in training dance motion models.

Citations (109)

View on Semantic Scholar

Summary

The paper presents a two-stream motion Transformer model that synthesizes diverse dance sequences from music inputs.
It leverages a large-scale YouTube dataset and novel evaluation metrics for physical plausibility, beat consistency, and motion diversity.
Experimental results demonstrate that the model outperforms acLSTM and ChorRNN in generating complex, synchronized dance movements.

Overview of "Learning to Generate Diverse Dance Motions with Transformer"

The paper "Learning to Generate Diverse Dance Motions with Transformer" presents a novel approach to dance motion synthesis that leverages the capabilities of Transformer models to generate diverse and complex dance movements from music inputs. The authors introduce a comprehensive framework that addresses multiple challenges inherent in dance motion synthesis, such as limited data diversity and the requirement for manual data handling in existing methods.

Key Contributions

Data Augmentation and Collection: The authors overcome the constraint of limited motion capture data by creating a large-scale, diverse dance motion dataset from YouTube videos. This dataset, encompassing 50 hours of synchronized music and dance pose sequences, mitigates the lack of diversity typically found in smaller datasets like the CMU mocap dataset.
Two-Stream Motion Transformer Model: A cornerstone of their approach is the introduction of a two-stream motion transformer model designed to capture long-term dependencies in motion and ensure diverse motion generation. This model learns the motion distribution using discrete pose representations, which improve upon previous deterministic motion representations.
Evaluation Metrics: The paper advances new evaluation metrics to assess the quality of synthesized dance motions. This includes metrics for physical plausibility using a virtual humanoid in the Bullet simulator, beat consistency to ensure dance motions align with music beats, and dance diversity for evaluating the variation in generated motions.
Experimental Validation: The authors conducted an extensive paper to demonstrate that their model outperforms existing methods such as acLSTM and ChorRNN, in both qualitative and quantitative terms. They show that their system can efficiently generate plausible and diverse dance sequences across various music inputs.

Implications of the Research

The research introduced in this paper has substantial implications in the domain of automated animation synthesis, particularly in reducing the production cycle for digital dance performances. By bypassing the traditional requirement for costly and labor-intense motion capture systems, this method facilitates quicker and more efficient production of dance animations.

Theoretically, the introduction of a two-stream transformer model represents a significant step forward in generative modeling for motion synthesis. This model can potentially be applied to other domains requiring complex temporal dependencies and diverse generation, such as gesture generation or avatar animation in video games.

Future Directions

Future work may focus on enhancing the diversity and realism of generated dance motions by incorporating additional audio features such as lyrics or instrumental layouts. Furthermore, integration of more detailed motion features, like facial expressions or finger animations, could be explored to augment the expressiveness of synthesized dance movements.

Overall, the paper presents a robust framework for dance motion synthesis that enhances both the diversity and computational efficiency of motion generation, paving the way for broader practical applications in virtual entertainment and interactive digital media.

PDF Markdown

Related Papers

YouTube

Show All Videos