Rhythm is a Dancer: Music-Driven Motion Synthesis with Global Structure (2111.12159v1)

Published 23 Nov 2021 in cs.GR and cs.LG

Abstract: Synthesizing human motion with a global structure, such as a choreography, is a challenging task. Existing methods tend to concentrate on local smooth pose transitions and neglect the global context or the theme of the motion. In this work, we present a music-driven motion synthesis framework that generates long-term sequences of human motions which are synchronized with the input beats, and jointly form a global structure that respects a specific dance genre. In addition, our framework enables generation of diverse motions that are controlled by the content of the music, and not only by the beat. Our music-driven dance synthesis framework is a hierarchical system that consists of three levels: pose, motif, and choreography. The pose level consists of an LSTM component that generates temporally coherent sequences of poses. The motif level guides sets of consecutive poses to form a movement that belongs to a specific distribution using a novel motion perceptual-loss. And the choreography level selects the order of the performed movements and drives the system to follow the global structure of a dance genre. Our results demonstrate the effectiveness of our music-driven framework to generate natural and consistent movements on various dance types, having control over the content of the synthesized motions, and respecting the overall structure of the dance.

Citations (60)

View on Semantic Scholar

Summary

The paper's main contribution is a hierarchical framework integrating pose, motif, and choreography levels to generate long-term, music-synced dance sequences.
It employs an auto-conditioned LSTM, motion perceptual loss, and AdaIN layer to precisely control joint rotations and style variations, outperforming baseline models.
The method's success across multiple dance genres underscores its potential for advancing digital animation, choreography tools, and virtual reality applications.

Music-Driven Motion Synthesis with Global Structure

The paper, "Rhythm is a Dancer: Music-Driven Motion Synthesis with Global Structure," addresses a complex challenge in the field of computer graphics and animation: generating long-term human motion sequences, such as dance, that are synchronized with input music and maintain a global choreographic structure. Most existing methods focus on local pose transitions and do not adequately capture the long-term thematic coherence of dance.

Framework Overview

This research introduces a hierarchical music-driven motion synthesis framework comprising three levels: pose, motif, and choreography. Each layer serves a distinct purpose, working collaboratively to ensure the synthesized dance respects both the beat of the music and the choreographic conventions of a specific dance genre.

Pose Level: This component uses an auto-conditioned LSTM network to generate sequences of poses. The network incorporates joint rotations (expressed as quaternions) for precise animation output and is conditioned on musical rhythm, motif representations, and foot contact labels.
Motif Level: Here, sets of consecutive poses are guided by a novel motion perceptual loss, ensuring the movements adhere to a specific motif cluster. The pose sequences are modulated using an Adaptive Instance Normalization (AdaIN) layer, driven by the spectral features of the music, to introduce diverse and style-consistent variations.
Choreography Level: At this highest level, the synthesis process maintains global consistency by driving the sequence of motions to follow a particular dance genre. This level ensures that the overall structure of the generated dance aligns with the genre's signature through a controlled selection of motifs, guided by a choreography model represented as a bag-of-motifs.

Method Evaluation

The effectiveness of the proposed framework is demonstrated through several distinct dance genres, including salsa (for both leader and follower roles), Greek folk, and modern dance. The researchers validate their method through various evaluations, including beat-to-motion synchronization, style variation according to audio spectral features, and the synthesis of diverse motion outputs from identical music inputs.

The framework's capability to preserve the global structure of a dance is quantitatively assessed by comparing generated motion signatures to template signatures derived from training data. The results reveal that the method maintains a closer alignment to the intended choreographic style than baseline models, highlighting its robustness in replicating realistic and contextually meaningful dance sequences.

Implications and Future Directions

The implications of this research extend broadly in the realms of digital character animation and autonomous dance generation. The ability to integrate global choreographic structures into motion synthesis not only facilitates realistic character interactions in virtual environments but also opens avenues for creative choreography tools that can aid in dance composition and performance predictions.

Future research can build upon this work by expanding the dataset to encompass more dance styles and incorporating additional features such as tempo changes, style transfers across genres, and interaction dynamics in group performances. The potential to utilize this framework in augmented and virtual reality applications, gaming, and training simulations represents a significant advancement in automated human motion generation.

Related Papers

YouTube

Show All Videos