Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MoGlow: Probabilistic and controllable motion synthesis using normalising flows (1905.06598v3)

Published 16 May 2019 in cs.LG, cs.GR, eess.IV, and stat.ML

Abstract: Data-driven modelling and synthesis of motion is an active research area with applications that include animation, games, and social robotics. This paper introduces a new class of probabilistic, generative, and controllable motion-data models based on normalising flows. Models of this kind can describe highly complex distributions, yet can be trained efficiently using exact maximum likelihood, unlike GANs or VAEs. Our proposed model is autoregressive and uses LSTMs to enable arbitrarily long time-dependencies. Importantly, is is also causal, meaning that each pose in the output sequence is generated without access to poses or control inputs from future time steps; this absence of algorithmic latency is important for interactive applications with real-time motion control. The approach can in principle be applied to any type of motion since it does not make restrictive, task-specific assumptions regarding the motion or the character morphology. We evaluate the models on motion-capture datasets of human and quadruped locomotion. Objective and subjective results show that randomly-sampled motion from the proposed method outperforms task-agnostic baselines and attains a motion quality close to recorded motion capture.

Citations (96)

Summary

  • The paper introduces MoGlow, a normalising flow model that overcomes regression to mean poses by capturing a variety of realistic motion outputs.
  • It leverages invertible neural transformations with LSTM integration and data dropout to robustly capture complex, long-term motion dependencies.
  • Experimental evaluations on human and quadrupedal datasets show that MoGlow outperforms traditional RNNs, GANs, and VAEs, enabling real-time, diverse motion synthesis.

An Expert Overview of "MoGlow: Probabilistic and Controllable Motion Synthesis Using Normalising Flows"

The paper "MoGlow: Probabilistic and Controllable Motion Synthesis Using Normalising Flows" by Gustav Eje Henter, Simon Alexanderson, and Jonas Beskow presents a novel approach to synthesizing motion data sequences using a probabilistic model based on normalizing flows. This work addresses significant challenges in motion synthesis, particularly the need for data-driven models that can describe complex distributions without the inefficiencies associated with GANs or VAEs.

Contributions and Scientific Approach

The introduction of MoGlow is a significant advancement in the area of motion synthesis. Its foundation in normalising flows allows it to efficiently learn and replicate the distribution of motion data, transforming simple base distributions into complex ones through invertible and differentiable neural network transformations. This presents strong advantages over earlier models.

  1. Probabilistic Modeling: Unlike deterministic models that suffer from regression to a mean pose, MoGlow’s probabilistic nature captures a wide variety of potential poses that a given control input could produce. This leads to more life-like, varied motion outputs as it does not restrain itself to a singular, average trajectory.
  2. Normalising Flows: The architecture leverages glow-based normalising flows to develop highly expressive, implicit distributions while maintaining exact log-likelihood computation. This overcomes the limitations of GANs, which struggle with training stability, and VAEs, which often fall short in capturing complex motion distributions accurately.
  3. Task Agnosticism and Causality: MoGlow does so without being bound by task-specific limitations or requiring future input for pose generation, which allows real-time applications. This is particularly crucial in interactive scenarios where immediate feedback and low latency are pivotal.
  4. Integration of LSTM and Data Dropout: Incorporating LSTMs for retaining long-time dependencies and utilizing data dropout stabilizes model training and enhances control signal adherence, promoting versatility across motion types, be it human or non-human.

Results and Evaluation

The authors conducted extensive experiments using both human and quadrupedal locomotion datasets, validating MoGlow’s ability to produce convincingly realistic motion. The model was assessed via subjective ratings from a large, diverse set of participants, alongside objective footstep analysis—quantifying motion quality through metrics like foot-sliding, which MoGlow significantly reduced.

  • Subjective Evaluations: MoGlow consistently outperformed previous task-agnostic methods like RNNs and VAEs. It approached or matched the quality of task-specific state-of-the-art methods such as QuaterNet for human locomotion and mode-adaptive neural networks for quadrupedal motion, demonstrating its efficacy and generalizability.
  • Objective Evaluations: The model achieved a balance between motion diversity and adherence to the control signal without introducing algorithmic latency, thus supporting applications where immediate adaptation to new inputs is necessary.

Implications and Future Work

MoGlow's ability to synthesize high-quality motion devoid of task-specific assumptions significantly expands the applications of computational models in fields like animation and robotics. It opens pathways for future research to adapt the model for a myriad of motion types and control schemes, potentially integrating it with novel systems that demand robust, real-time human-machine interaction.

Furthermore, harnessing MoGlow for generating other complex motion scenarios, such as gestures synchronized with speech or emotion-modulated motion, would be valuable endeavors. Future enhancements in model efficiency, perhaps by merging with techniques like model distillation or incorporating stronger physical constraints, would further broaden MoGlow's practical applicability in dynamic environments.

In conclusion, MoGlow represents a compelling convergence of probabilistic modeling, machine learning, and motion synthesis technologies, emphasizing a broader capability of normalizing-flow-based approaches to model temporal and versatile data distributions in real-world applications.

Youtube Logo Streamline Icon: https://streamlinehq.com