DFM: Deep Fourier Mimic for Expressive Dance Motion Learning

Published 16 Feb 2025 in cs.RO | (2502.10980v2)

Abstract: As entertainment robots gain popularity, the demand for natural and expressive motion, particularly in dancing, continues to rise. Traditionally, dancing motions have been manually designed by artists, a process that is both labor-intensive and restricted to simple motion playback, lacking the flexibility to incorporate additional tasks such as locomotion or gaze control during dancing. To overcome these challenges, we introduce Deep Fourier Mimic (DFM), a novel method that combines advanced motion representation with Reinforcement Learning (RL) to enable smooth transitions between motions while concurrently managing auxiliary tasks during dance sequences. While previous frequency domain based motion representations have successfully encoded dance motions into latent parameters, they often impose overly rigid periodic assumptions at the local level, resulting in reduced tracking accuracy and motion expressiveness, which is a critical aspect for entertainment robots. By relaxing these locally periodic constraints, our approach not only enhances tracking precision but also facilitates smooth transitions between different motions. Furthermore, the learned RL policy that supports simultaneous base activities, such as locomotion and gaze control, allows entertainment robots to engage more dynamically and interactively with users rather than merely replaying static, pre-designed dance routines.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces DFM, which leverages reinforcement learning to generate smooth, expressive dance motions and reduces tracking error to 0.094 rad.
The method abandons strong periodic assumptions by adapting Fourier latent dynamics to capture detailed non-periodic movement features.
The approach supports multitask performance by allowing robots to execute dance, locomotion, and gaze control simultaneously for enhanced interactivity.

DFM: Deep Fourier Mimic for Expressive Dance Motion Learning

The paper "DFM: Deep Fourier Mimic for Expressive Dance Motion Learning" introduces an advanced approach for enhancing the expressiveness and interactivity of dance motions in entertainment robots such as Sony's aibo. By integrating Deep Fourier Mimic (DFM) with reinforcement learning (RL), this study addresses the rigidity and lack of dynamism found in traditional handcrafted robotic dance routines. The methodology allows robots to execute complex dance sequences with smooth transitions while simultaneously achieving auxiliary tasks such as locomotion and gaze control.

The research is motivated by the growing demand for more natural and interactive robotic movements in entertainment robotics, where dance serves as a key medium to convey emotion and engage audiences. Traditionally, these motions are labor-intensive, crafted manually by designers, and tend to lack the flexibility needed for interactive tasks. By leveraging the capabilities of DFM, this research demonstrates how relaxation of local periodic assumptions enhances the tracking accuracy and expressiveness of robot dances, overcoming limitations posed by previous periodic-based motion representations.

DFM relies on a learning-based pipeline involving several key components: motion design, motion representation, motion learning, and hardware inference. The core of the representation stage is an adapted version of Fourier Latent Dynamics (FLD), which decomposes motion sequences using phase and frequency encoding, originally tuned for periodic motions. Unlike FLD, DFM uses a fresh encoding approach, forgoing strong periodic assumptions to capture more detailed, expressive non-periodic features of dance motions. This technique is validated through comparisons against baseline methods, demonstrating significant improvements in tracking accuracy, particularly in non-periodic high-expressiveness motions.

Specifically, DFM achieves a mean absolute tracking error (MAE) of 0.094 rad across 170 motion evaluations on a real aibo unit, showcasing a reduction in error compared to FLD's 0.132 rad. The improvements stem from DFM's ability to dynamically update latent parameters during motion representation, accommodating variations in frequency, amplitude, and motion transitions that FLD's periodic assumptions could not handle effectively.

Furthermore, DFM extends its capabilities with multi-task demonstrations, allowing for simultaneous execution of expressive dance with auxiliary tasks such as locomotion and gaze control. This is enabled by a reward structure that emphasizes both locomotion and head orientation tasks, allowing the robot to perform complex multi-task operations. For instance, in locomotion tasks, aibo rotates in place while mimicking dance steps, and in gaze tasks, adjusts its head and limb movements to track target directions while maintaining expressiveness in its dance.

The implications of this research are substantial for the development of autonomous, interactive robots in entertainment and beyond. By utilizing advanced motion representation and learning strategies, robots can achieve a broader range of expressive tasks without sacrificing accuracy or interaction quality. Potential future developments prompted by this research include expanding DFM techniques to more diverse robotic platforms and exploring more complex interactions within human-robot collaborative settings. This study not only pushes the boundaries of robotic expressiveness but also opens new possibilities in the field of robotic arts and entertainment, where interaction and engagement are increasingly valued.

Markdown Report Issue