Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment (2403.18811v1)

Published 27 Mar 2024 in cs.CV, cs.GR, cs.SD, and eess.AS

Abstract: We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, a duet dance scenario entails a heightened degree of interaction between the two participants, requiring delicate coordination in both pose and position. To support this task, we first build a large-scale and diverse duet interactive dance dataset, DD100, by recording about 117 minutes of professional dancers' performances. To address the challenges inherent in this task, we propose a GPT-based model, Duolando, which autoregressively predicts the subsequent tokenized motion conditioned on the coordinated information of the music, the leader's and the follower's movements. To further enhance the GPT's capabilities of generating stable results on unseen conditions (music and leader motions), we devise an off-policy reinforcement learning strategy that allows the model to explore viable trajectories from out-of-distribution samplings, guided by human-defined rewards. Based on the collected dataset and proposed method, we establish a benchmark with several carefully designed metrics.

References (62)

Citations (9)

View on Semantic Scholar

Summary

The paper introduces Duolando, a novel model that leverages GPT with off-policy reinforcement learning to generate synchronized dance follower movements.
It utilizes the DD100 dataset, featuring 117 minutes of MoCap-recorded performances across 10 dance genres to train and benchmark the system.
The method enhances stability by defining human-understandable rewards, effectively reducing artifacts such as foot skating in the generated motions.

An Overview of Duolando: Leveraging GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

Introduction

The art of duet dancing, particularly in scenarios such as ballroom dancing, entails a synchronized coordination between two dance partners often referred to as the leader and the follower. The development of computational models enabling virtual agents to accompany a human dancer poses considerable practical value in augmented and virtual reality applications. Addressing this, the paper introduces Duolando, a model aimed at generating the responsive movements of a dance follower. At its core, Duolando leverages the Generative Pretrained Transformer (GPT) model enhanced with off-policy reinforcement learning, conditioned on the coordinated music and leader's movements, to generate the follower's dance movements.

DD100: A Comprehensive Duet Dance Dataset

A significant contribution of this research is the development of the DD100 dataset, the first of its kind, specifically designed to cater to the duet dance accompaniment task. The dataset encompasses approximately 117 minutes of professional dance performances covering a broad spectrum of 10 dance genres. Performance data was meticulously recorded using high-end motion capture (MoCap) technology to ensure the accuracy and richness of the motion data. This dataset not only serves as a training and benchmarking foundation for the paper but also advances the frontier in research on interactive dance generation.

The Duolando Framework

Proposing a novel framework, Duolando addresses the nuanced task of generating follower dance movements that are not only rhythmically coherent with the music but also in synchronized coordination with the leader's dance movements. The model architecture is devised in two stages: initially employing VQ-VAEs to tokenize the movement data into a quantized sequence, followed by utilizing a GPT-based network to autoregressively predict subsequent movement tokens conditioned on the input music, the leader’s movements, and the preceding sequence of the follower’s movements. A notable feature of Duolando is the incorporation of an off-policy reinforcement learning strategy aimed at enhancing the model's aptitude in generating plausible sequences under unseen conditions.

Enhanced with Off-Policy Reinforcement Learning

A pivotal aspect of the Duolando model is the use of off-policy reinforcement learning to address the generation stability issues that surface under out-of-distribution inputs. By defining explicit human-understandable rewards, the model is trained to explore feasible movement trajectories, thereby ensuring the generated follower movements maintain a natural and synchronized coordination without introducing artifacts such as foot skating.

Establishing a New Benchmark

Grounded on the presented DD100 dataset and the proposed Duolando model, a comprehensive benchmark is established, accompanied by an array of carefully crafted metrics. These metrics not only evaluate the intrinsic quality of the generated follower movements but also assess the interaction dynamics between the dance partners and their alignment with the underlying musical rhythm. The introduction of such a benchmark is expected to propel forward the research in the field of interactive dance generation, laying down a foundational stone for future explorations.

Concluding Remarks

The paper introduces a significant leap towards understanding and synthesizing the intricate nature of human-human interaction in the context of duet dancing. By proposing the Duolando model and establishing the DD100 dataset, this work not only presents a novel approach to dance accompaniment but also sets a new benchmark for evaluating dance interaction models. The implications of this research extend beyond the academic domain, harboring potential transformative impacts on the development of interactive entertainment and training applications in virtual environments.

PDF Markdown

Related Papers

Tweets

https://twitter.com/liuziwei7/status/1773727619999015280

https://twitter.com/arxivsanitybot/status/1773892227410063514

YouTube

Show All Videos