Neural Kinematic Networks for Unsupervised Motion Retargetting (1804.05653v1)

Published 16 Apr 2018 in cs.CV

Abstract: We propose a recurrent neural network architecture with a Forward Kinematics layer and cycle consistency based adversarial training objective for unsupervised motion retargetting. Our network captures the high-level properties of an input motion by the forward kinematics layer, and adapts them to a target character with different skeleton bone lengths (e.g., shorter, longer arms etc.). Collecting paired motion training sequences from different characters is expensive. Instead, our network utilizes cycle consistency to learn to solve the Inverse Kinematics problem in an unsupervised manner. Our method works online, i.e., it adapts the motion sequence on-the-fly as new frames are received. In our experiments, we use the Mixamo animation data to test our method for a variety of motions and characters and achieve state-of-the-art results. We also demonstrate motion retargetting from monocular human videos to 3D characters using an off-the-shelf 3D pose estimator.

Citations (201)

View on Semantic Scholar

Summary

The paper introduces a novel neural network featuring an integrated forward kinematics layer and cycle consistency adversarial training to enable unsupervised motion retargetting.
The method employs an encoder-decoder RNN architecture to adapt motion sequences across characters with varying skeletal structures, preserving natural movement.
Empirical results on the Mixamo dataset show state-of-the-art performance with reduced joint errors and effective 3D animation synthesis from monocular videos.

Neural Kinematic Networks for Unsupervised Motion Retargetting

The paper "Neural Kinematic Networks for Unsupervised Motion Retargetting" introduces a novel approach to the problem of motion retargetting, specifically focusing on adapting movements from one character to another with differing kinematic structures, such as varying bone lengths. The solution leverages a neural network architecture integrated with a forward kinematics layer and adopts a cycle consistency-based adversarial training framework.

Overview of the Approach

The authors propose a recurrent neural network (RNN) architecture consisting of encoder and decoder components, coupled through a forward kinematics layer. This constructs a bridge between input and target motions, facilitating the adaptation of motions across characters without direct supervision. The network encodes input motion sequences into motion features, which are then decoded to generate joint rotations for a target skeleton with a unique kinematic configuration.

Adversarial and Cycle Consistency Training

A significant contribution of this work is the integration of adversarial training with a cycle consistency loss to enable unsupervised learning. The cycle consistency principle ensures that if a motion is encoded and then re-targeted back to its original configuration, it should closely resemble the initial motion. This constraint eliminates the need for manually paired datasets of source and target motion sequences, which are often expensive and impractical to obtain.

The adversarial component further enhances the network's ability to produce realistic motions. By involving a discriminator, which assesses the realism of the retargeted motion against genuine samples, the generator is pushed to output more authentic and lifelike motions, thereby refining the quality of motion synthesis.

Empirical Findings and Results

Through extensive experimentation on the Mixamo dataset, a comprehensive animation library, the proposed method demonstrated state-of-the-art performance in several test scenarios. These included both known motions transferred to known characters and instances involving novel motions and unseen target characters. The results show a marked improvement in the accuracy of motion retargetting, as evidenced by reduced mean square error (MSE) across joint locations.

The model effectively utilizes the forward kinematics layer to adhere to the physical constraints of the target skeleton, preserving natural motion properties such as foot placement, which is often challenging in alternative methods. Additionally, an intriguing application is the use of monocular human videos to generate animations of 3D characters, bridging real-world human motion capture and virtual animation.

Contributions and Future Directions

The paper's contributions can be distilled into two main advancements: the incorporation of a differentiable forward kinematics layer in a neural architecture and the deployment of an adversarial cycle consistency framework for unsupervised training. These innovations hold substantial promise for further advances in character animation and pose estimation.

Future work may explore extending this framework to handle variable joint structures or incorporating physical interactions with environments, such as gravity, to synthesize more complex and realistic animations. Furthermore, integrating input from raw video sequences directly into the network could streamline the entire animation pipeline, making it more adaptable and accessible.

In conclusion, this paper provides a substantial step forward in the domain of automated animation and motion synthesis, enriching the toolkit available for designers and developers in gaming, animation, and broader fields involving virtual character interaction.

PDF Markdown