MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction (2108.07152v2)

Published 16 Aug 2021 in cs.CV

Abstract: Human motion prediction is a challenging task due to the stochasticity and aperiodicity of future poses. Recently, graph convolutional network has been proven to be very effective to learn dynamic relations among pose joints, which is helpful for pose prediction. On the other hand, one can abstract a human pose recursively to obtain a set of poses at multiple scales. With the increase of the abstraction level, the motion of the pose becomes more stable, which benefits pose prediction too. In this paper, we propose a novel Multi-Scale Residual Graph Convolution Network (MSR-GCN) for human pose prediction task in the manner of end-to-end. The GCNs are used to extract features from fine to coarse scale and then from coarse to fine scale. The extracted features at each scale are then combined and decoded to obtain the residuals between the input and target poses. Intermediate supervisions are imposed on all the predicted poses, which enforces the network to learn more representative features. Our proposed approach is evaluated on two standard benchmark datasets, i.e., the Human3.6M dataset and the CMU Mocap dataset. Experimental results demonstrate that our method outperforms the state-of-the-art approaches. Code and pre-trained models are available at https://github.com/Droliven/MSRGCN.

Citations (180)

View on Semantic Scholar

Summary

The paper presents a novel multi-scale residual GCN framework that effectively captures hierarchical pose abstractions for accurate human motion prediction.
It employs descending and ascending GCN blocks with residual connections to stabilize and refine spatiotemporal features from pose data.
Experimental results on Human3.6M and CMU Mocap datasets show lower MPJPE, outperforming previous state-of-the-art models.

Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction

The paper "MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction" introduces a novel framework for predicting human motion by leveraging Graph Convolutional Networks (GCNs) within a multi-scale, residual learning paradigm. This approach is motivated by the necessity to capture the stochastic and aperiodic nature of human movements, which present significant challenges for traditional prediction models.

Summary of the Methodology

Human motion prediction requires an accurate understanding of spatiotemporal dependencies among joints over time. The proposed MSR-GCN addresses this by considering human poses as fully connected graphs, where joints are represented as graph nodes. Various scales of abstraction are applied to these poses, effectively stabilizing motion patterns and making predictions more manageable.

The architecture of MSR-GCN is structured into three main components: start GCN, descending and ascending GCN blocks, and end GCNs. Initially, pose sequence data is mapped into a feature space using start GCNs. The descending GCN blocks abstract these features to coarser scales, stabilizing the motion representation, whereas ascending blocks aim to refine these abstractions back to finer scales. End GCNs then decode the multi-scale features back into pose predictions at different levels. Importantly, the framework imposes intermediate supervisions across all scales, encouraging the model to learn nuanced, representative features. The whole network is further enhanced with residual connections to efficiently learn the residuals between inputs and outputs.

Experimental Results

The performance of MSR-GCN was evaluated on two widely recognized datasets: Human3.6M and CMU Mocap. The reported results indicate that MSR-GCN consistently surpasses previous state-of-the-art methods, such as those utilizing RNNs and basic GCN frameworks, in both short-term and long-term prediction scenarios. For instance, the method shows lower Mean Per Joint Position Errors (MPJPE) across a variety of complex human activities, evidencing its superior capability to handle high-frequency limb movements.

Implications and Future Developments

The paper contributes significantly to the field by demonstrating how structured pose abstractions in a multi-scale format can enhance prediction accuracy while maintaining computational efficiency. This framework extends the usage of GCNs to more complex domains where temporal and spatial intricacies are deeply intertwined. Practically, improvements in motion prediction can directly benefit fields such as surveillance, animation, and human-computer interaction.

In terms of theoretical implications, the MSR-GCN framework suggests that incorporating hierarchical learning within graph structures can be a powerful tool for other graph-based prediction tasks. This insight may encourage further exploration of joint abstractions in other domains, potentially leading to advancements in dynamic system modeling and spatial-temporal data analysis.

Conclusion

Overall, the MSR-GCN presents a robust, scalable approach to human motion prediction, contributing both novel methodology and demonstrable efficacy. Future research could explore integrating more dynamic joint hierarchies and experimenting with alternative graph structures which may further revolutionize motion-related predictive models in artificial intelligence.

PDF Markdown

Related Papers

GitHub

GitHub - Droliven/MSRGCN: Official implementation of MSR-GCN (ICCV2021 paper) (66 stars)