CALM: Conditional Adversarial Latent Models for Directable Virtual Characters (2305.02195v1)

Published 2 May 2023 in cs.CV, cs.AI, and cs.RO

Abstract: In this work, we present Conditional Adversarial Latent Models (CALM), an approach for generating diverse and directable behaviors for user-controlled interactive virtual characters. Using imitation learning, CALM learns a representation of movement that captures the complexity and diversity of human motion, and enables direct control over character movements. The approach jointly learns a control policy and a motion encoder that reconstructs key characteristics of a given motion without merely replicating it. The results show that CALM learns a semantic motion representation, enabling control over the generated motions and style-conditioning for higher-level task training. Once trained, the character can be controlled using intuitive interfaces, akin to those found in video games.

Authors (6)

Chen Tessler (24 papers)
Yoni Kasten (29 papers)
Yunrong Guo (14 papers)
Shie Mannor (228 papers)
Gal Chechik (110 papers)
Xue Bin Peng (52 papers)

Citations (48)

View on Semantic Scholar

Summary

Conditional Adversarial Latent Models for Directable Virtual Characters

This paper presents Conditional Adversarial Latent Models (CALM), a sophisticated approach to generating controllable behaviors for interactive virtual characters. The method leverages imitation learning to develop a representation of movement that encapsulates the complexity and variability of human motion, providing direct control over character movements.

Methodology Overview

CALM's framework encompasses three primary phases: low-level training, precision training, and inference. During the low-level training phase, the system learns both a motion encoder and a motion generator simultaneously using conditional adversarial techniques. The encoder maps raw motion capture data into a low-dimensional semantic representation. Subsequently, the motion generator employs this representation to produce movements that resemble the original motion characteristics without strict replication, thus creating pathways for novel and diverse behaviors.

In the precision training stage, CALM refines high-level task-driven policies to enable control over the directionality of learned motions. This involves training a high-level policy to produce latent variables, which are fed into the low-level policy to command the virtual character to perform desired tasks with precision.

Finally, in the inference phase, CALM illustrates its ability to solve complex tasks without additional training by utilizing established transitions between motions and finite state machines (FSMs). These FSMs dictate motion sequences akin to video game controls, enabling intricate task solutions through sequences of motions.

Technical Details

A notable strength of CALM is its unsupervised approach, which contrasts with methods requiring labeled data or semantic connections derived from natural language. Instead, CALM directly infers semantic meaning from motion similarity, providing a more generalizable model across diverse datasets.

The adversarial imitation learning phase incorporates a conditional discriminator that mitigates mode collapse and ensures the policy replicates a broad spectrum of behaviors. The encoder outputs are constrained onto an $l_2$ unit hypersphere, enhancing stability and ensuring natural behavior transitions even from out-of-distribution latent samples.

Results and Performance Metrics

Quantitatively, CALM exhibits substantial improvements in encoder quality, diversity of motion generation, and controllability over tasks compared to existing models such as ASE. The paper reports a significant enhancement in generation accuracy, indicating CALM's proficiency in not only fostering diverse, human-like motion but also maintaining control and directionality in movement patterns.

Furthermore, CALM demonstrates its strength in solving unseen tasks through zero-shot inference, using FSMs to guide characters through sequences of specified motions without new training. This highlights CALM's scalability and potential for integration into real-world applications like gaming and animation.

Implications and Future Directions

The implications of CALM's development are wide-ranging, offering a robust model capable of generating intricate, life-like motions for virtual characters across various platforms. Its unsupervised nature and end-to-end learning framework position it well for further advancements in AI-driven animation and interactive applications. Future work may focus on enhancing the precision of motion directionality control and exploring automated methods to integrate complex aerobatic or interactive movements with environmental constraints. Additionally, improving the robustness of FSMs and addressing challenges with rendering artifacts will be critical for broader adoption in industry settings.

In summary, CALM represents a diverse and flexible approach to character animation, pushing the boundaries of what machine learning can accomplish in the field of virtual behavior synthesis. As researchers continue to explore these avenues, the practical and theoretical implications of such models will undoubtedly evolve, suggesting a promising trajectory for AI in creative and interactive domains.

Related Papers

Find Related Papers

YouTube

Show All Videos