Character Controllers Using Motion VAEs (2103.14274v1)

Published 26 Mar 2021 in cs.LG and cs.GR

Abstract: A fundamental problem in computer animation is that of realizing purposeful and realistic human movement given a sufficiently-rich set of motion capture clips. We learn data-driven generative models of human movement using autoregressive conditional variational autoencoders, or Motion VAEs. The latent variables of the learned autoencoder define the action space for the movement and thereby govern its evolution over time. Planning or control algorithms can then use this action space to generate desired motions. In particular, we use deep reinforcement learning to learn controllers that achieve goal-directed movements. We demonstrate the effectiveness of the approach on multiple tasks. We further evaluate system-design choices and describe the current limitations of Motion VAEs.

Citations (230)

View on Semantic Scholar

Summary

The paper introduces Motion VAEs that generate high-quality human motion through an autoregressive encoder-decoder model.
The methodology employs a mixture-of-experts decoder and a stochastic latent space to achieve flexible, goal-directed animation control.
Experimental results demonstrate robust performance on target acquisition, joystick control, and maze navigation with minimized foot-skating artifacts.

Insights into "Character Controllers Using Motion VAEs"

The paper "Character Controllers Using Motion VAEs" presents a sophisticated approach to generating realistic human motion in interactive computer animation through the use of Motion Variational Autoencoders (MVAE). The research addresses the critical task of producing purposeful and high-quality character movements from motion capture datasets, leveraging autoregressive conditional VAEs to model plausible human motion dynamics.

Methodology and Implementation

The key contribution of this paper is the development and employment of Motion VAEs, which serve as a generative model for human motion. The MVAE is particularly designed to be an autoregressive system that predicts subsequent poses conditioned on the current pose alongside sampling from stochastic latent variables. This approach defines a flexible action space, making it exploitable by planning or control algorithms to generate goal-directed animations using deep reinforcement learning (DRL).

The generative model comprises of an encoder-decoder configuration where:

Encoder: Encodes the past and current pose to a latent space.
Decoder: Utilizes this encoded representation to predict the next pose within a conditioned framework.

A unique aspect of the MVAE is its use of a mixture-of-experts structure for the decoder, which enhances motion quality and coherence by partitioning the problem space among several expert networks tuned by a gating mechanism.

Experimental Results

The paper elaborates on several tasks to demonstrate the capabilities of the proposed system, including Target acquisition, Joystick control, and more complex scenarios like Maze Navigation. Each task reveals the adaptability and robustness of the system in controlling character motion, even when starting from rudimentary or acyclic movements such as a sprint or resting pose.

Target Task: The character effectively learned to navigate towards randomized targets in a predefined arena, showcasing the controller’s efficiency.
Joystick Control: Simulated scenarios confirmed the character's ability to adjust speed and direction based on joystick input, achieving responsive motion dynamics.
Path Follower: The system demonstrated competency in following complex curves like a figure-eight path while maintaining stability and versatility in real-time feedback from the environment.
Maze Runner: The character utilized a simple sensory input system to navigate through enclosed mazes, emphasizing the model's strength in exploring within novel environments.

Key Observations and Implications

Foot Skating Minimization: The system adeptly reduces the commonly observed foot-sliding artifacts in motion synthesis, achieving minimal discrepancies compared to baseline motion capture data.
Control Flexibility: Latent space serves as an effective control medium, enabling the enactment of complex movements and seamless transitions without explicit foot contact annotations.
Model Generalization: The application successfully generalizes beyond the original motion dataset, although the effectiveness is inherently influenced by the diversity and richness of the input data.

Limitations and Future Directions

The research highlights some limitations, such as the reliance on the motion capture data quality and distribution, which may introduce biases in model output—an area warranting deeper investigation. Future work will focus on integrating larger datasets, enhancing environment interactions, and refining artist control over the procedural motion systems. Exploring the nuances of multi-agent character control and dynamic environment interaction appears promising, potentially paving the way for unprecedented advances in interactive animation and realistic motion synthesis.

This paper underscores the potential of combining deep generative models and reinforcement learning for sophisticated animation control, adding valuable insights to the fields of character animation and interactive simulations.

PDF Markdown