Dynamic Multiscale Graph Neural Networks for 3D Skeleton-Based Human Motion Prediction (2003.08802v1)

Published 17 Mar 2020 in cs.CV, cs.LG, and stat.ML

Abstract: We propose novel dynamic multiscale graph neural networks (DMGNN) to predict 3D skeleton-based human motions. The core idea of DMGNN is to use a multiscale graph to comprehensively model the internal relations of a human body for motion feature learning. This multiscale graph is adaptive during training and dynamic across network layers. Based on this graph, we propose a multiscale graph computational unit (MGCU) to extract features at individual scales and fuse features across scales. The entire model is action-category-agnostic and follows an encoder-decoder framework. The encoder consists of a sequence of MGCUs to learn motion features. The decoder uses a proposed graph-based gate recurrent unit to generate future poses. Extensive experiments show that the proposed DMGNN outperforms state-of-the-art methods in both short and long-term predictions on the datasets of Human 3.6M and CMU Mocap. We further investigate the learned multiscale graphs for the interpretability. The codes could be downloaded from https://github.com/limaosen0/DMGNN.

Citations (287)

View on Semantic Scholar

Summary

The paper presents DMGNN, a novel framework that leverages multiscale graph representations to capture intra- and inter-scale skeletal dynamics.
It incorporates innovative modules like MGCU and G-GRU to effectively fuse spatial features and model temporal motion dependencies.
Experimental results on Human 3.6M and CMU Mocap show superior accuracy in both short-term (400ms) and long-term human motion prediction.

Review of "Dynamic Multiscale Graph Neural Networks for 3D Skeleton-Based Human Motion Prediction"

The paper "Dynamic Multiscale Graph Neural Networks for 3D Skeleton-Based Human Motion Prediction" introduces a novel approach to predicting human motion using a multiscale graph neural network framework. This framework, referred to as DMGNN, is designed to leverage the hierarchical and relational properties of the human skeletal system. It is particularly focused on modeling both spatial and temporal dependencies for effective human motion prediction.

Core Contributions

The key contribution of this research is the introduction of the Dynamic Multiscale Graph Neural Network (DMGNN), which incorporates the following elements:

Multiscale Graph Representation: DMGNN utilizes a novel multiscale representation of the human body. This representation includes nodes representing body components at various scales and edges capturing both intra-scale and inter-scale relationships. The model encompasses three body scales: individual joints, low-level parts, and high-level parts.
Multiscale Graph Computational Unit (MGCU): The paper introduces MGCU as a core component of the DMGNN, designed to extract and fuse motion features at multiple scales. Each MGCU includes Single-Scale Graph Convolution Blocks (SS-GCB) and Cross-Scale Fusion Blocks (CS-FB), which enable the model to capture comprehensive body dynamics.
Graph-Based Gated Recurrent Unit (G-GRU): For temporal modeling, the paper proposes a G-GRU, which incorporates a trainable graph within its recurrent framework to enhance state propagation and predict sequences accurately.
Use of High-Order Differences: The paper employs difference operators to capture not only positional data but high-order derivatives, such as velocities and accelerations, which serve as proxies to enhance motion prediction capabilities.

Experimental Results

Extensive experiments were conducted on two major motion prediction datasets: Human 3.6M and CMU Mocap. The results obtained confirm the superiority of DMGNN, which outperforms state-of-the-art baseline models for both short-term and long-term motion predictions across various action categories.

Short-Term Predictions: The DMGNN demonstrates lower mean angle errors (MAE) compared to baselines such as Res-sup and Traj-GCN, particularly showing strength in predicting nuanced motions within 400 milliseconds.
Long-Term Predictions: Even in the challenging task of long-term prediction, DMGNN maintains its performance advantage, proving capable of capturing the prolonged temporal dynamics required for accurate human motion prediction.

Interpretability and Insights

One of the notable aspects of DMGNN is its capacity for interpretability. The paper examines the learned multiscale graphs, which adaptively change across network layers and during training. This adaptability allows DMGNN to capture distinct motion patterns that reflect the functional and coordinated movements inherent in human activities.

Implications and Future Work

The development of DMGNN highlights substantial advancement in modeling human motion through the lens of graph-based neural networks. Practically, the framework has immediate applications in areas such as human-computer interaction, robotics, and video surveillance, where understanding and predicting human movement is crucial.

Theoretically, the approach sets the stage for future work in graph neural network research, particularly in how dynamic, adaptive graphs can improve learning representations in complex systems. Future research could explore the application of DMGNN to other domains with multiscale relational data, as well as integrating more complex graph structures and recurrent units to further enhance predictive performance.

In conclusion, the DMGNN framework represents a compelling advance in the domain of human motion prediction, combining the strengths of multiscale graph representations with dynamic neural network learning. The results indicate its potential not only in setting new benchmarks for performance but also in providing new tools for interpreting the intricacies of human motion at multiple scales.

PDF Markdown