Symbiotic Graph Neural Networks for 3D Skeleton-based Human Action Recognition and Motion Prediction (1910.02212v1)

Published 5 Oct 2019 in cs.CV

Abstract: 3D skeleton-based action recognition and motion prediction are two essential problems of human activity understanding. In many previous works: 1) they studied two tasks separately, neglecting internal correlations; 2) they did not capture sufficient relations inside the body. To address these issues, we propose a symbiotic model to handle two tasks jointly; and we propose two scales of graphs to explicitly capture relations among body-joints and body-parts. Together, we propose symbiotic graph neural networks, which contain a backbone, an action-recognition head, and a motion-prediction head. Two heads are trained jointly and enhance each other. For the backbone, we propose multi-branch multi-scale graph convolution networks to extract spatial and temporal features. The multi-scale graph convolution networks are based on joint-scale and part-scale graphs. The joint-scale graphs contain actional graphs, capturing action-based relations, and structural graphs, capturing physical constraints. The part-scale graphs integrate body-joints to form specific parts, representing high-level relations. Moreover, dual bone-based graphs and networks are proposed to learn complementary features. We conduct extensive experiments for skeleton-based action recognition and motion prediction with four datasets, NTU-RGB+D, Kinetics, Human3.6M, and CMU Mocap. Experiments show that our symbiotic graph neural networks achieve better performances on both tasks compared to the state-of-the-art methods.

Citations (165)

View on Semantic Scholar

Summary

The paper introduces Symbiotic Graph Neural Networks (Sym-GNN), a unified model designed to jointly address 3D skeleton-based human action recognition and motion prediction by allowing these tasks to mutually inform each other.
Sym-GNN employs multi-scale graph convolutions, combining joint-scale (actional/structural) and part-scale graphs, alongside dual bone-based networks to capture detailed spatial and temporal dynamics.
Evaluated on four datasets, Sym-GNN achieved superior performance in both action recognition and motion prediction, demonstrating potential for real-time applications and advancing graph-based visual computing.

Symbiotic Graph Neural Networks for 3D Skeleton-based Human Action Recognition and Motion Prediction

The paper explores the intricate challenges posed by 3D skeleton-based human action recognition and motion prediction. These tasks, central to human activity understanding, often have been studied in isolation, which results in neglecting the inherent mutual interconnections. Additionally, prior approaches have insufficiently captured the complex relations within the human body. To address these limitations, the authors propose a unified symbiotic model, designed to jointly handle both tasks through the framework of Symbiotic Graph Neural Networks (Sym-GNN).

Key Contributions

Joint Task Approach: The paper introduces a multitask framework that concurrently addresses both action recognition and motion prediction. This is achieved using a symbiotic design wherein the two tasks inform and enhance each other. The interaction ensures that the recognition heads provide valuable action category insights for the prediction tasks, while the prediction heads refine recognition through enhanced feature detailing.
Multi-Scale Graph Convolutions: The essence of Sym-GNN lies in its ability to extract detailed features by employing multi-scale graph convolutional networks. This involves joint-scale graphs for fine-grained interactions between individual body joints and part-scale graphs for broader patterns across body parts. The inclusion of these dual scales allows a comprehensive capture of spatial and temporal dynamics fundamental to human actions.
Actional and Structural Graphs: At the joint scale, the paper delineates two types of graphs—actional and structural. Actional graphs model task-specific relationships deduced from the data, while structural graphs represent physical dependencies based on body anatomy. These graphs are leveraged through specialized joint-scale graph and temporal convolutions for enhanced feature extraction.
Dual Bone-Based Networks: By proposing an additional bone-based network that focuses on the skeletal interconnections rather than just joint positions, the model captures complementary information, particularly beneficial for tasks like motion prediction that may benefit from bone orientation dynamics.

Experimental Evaluation

The efficacy of Sym-GNN is evaluated across four challenging datasets—NTU-RGB+D, Kinetics, Human 3.6M, and CMU Mocap—demonstrating superior performance in both action recognition and motion prediction. Noteworthy results include outperforming state-of-the-art methods, with significant advancements in predictive accuracy and robustness to input perturbations.

Implications and Future Directions

The proposed symbiotic model presents significant implications for real-time applications in surveillance, human-machine interaction, and sports analysis, where nuanced understanding of human actions and forecasts of future positions are crucial. The integration of multi-scale graph representations advances the state of visual computing in harnessing high-dimensional data structures, setting the stage for further exploration in adaptive and data-driven graph generation.

Looking forward, the paper could inspire further research into the dynamic adaptation of actional graphs that autonomously evolve with the input data stream. Moreover, ongoing advancements in AI hardware could mitigate computational overheads, paving the way for real-time deployment of these graph-based models in diverse environments. This line of work also opens avenues for exploring transfer learning capabilities across different domains of motion data, potentially enhancing model generalization across varied datasets.

In sum, the integration of multitasking frameworks with complex graph-based feature extraction presents a substantial leap in 3D human activity analysis, demonstrating the viability of Sym-GNN as a comprehensive tool for both academic investigation and practical application.