Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching (1911.04131v1)

Published 11 Nov 2019 in cs.CV

Abstract: Human action recognition from skeleton data, fueled by the Graph Convolutional Network (GCN), has attracted lots of attention, due to its powerful capability of modeling non-Euclidean structure data. However, many existing GCN methods provide a pre-defined graph and fix it through the entire network, which can loss implicit joint correlations. Besides, the mainstream spectral GCN is approximated by one-order hop, thus higher-order connections are not well involved. Therefore, huge efforts are required to explore a better GCN architecture. To address these problems, we turn to Neural Architecture Search (NAS) and propose the first automatically designed GCN for skeleton-based action recognition. Specifically, we enrich the search space by providing multiple dynamic graph modules after fully exploring the spatial-temporal correlations between nodes. Besides, we introduce multiple-hop modules and expect to break the limitation of representational capacity caused by one-order approximation. Moreover, a sampling- and memory-efficient evolution strategy is proposed to search an optimal architecture for this task. The resulted architecture proves the effectiveness of the higher-order approximation and the dynamic graph modeling mechanism with temporal interactions, which is barely discussed before. To evaluate the performance of the searched model, we conduct extensive experiments on two very large scaled datasets and the results show that our model gets the state-of-the-art results.

Citations (306)

View on Semantic Scholar

Summary

The paper introduces dynamic graph modeling by replacing fixed graphs with adaptive spatial-temporal modules to capture joint correlations.
The paper incorporates multi-hop modules to capture higher-order connections, enhancing the GCN's capacity for learning complex spatial relationships.
The paper leverages a tailored NAS framework to automatically optimize GCN architectures, achieving state-of-the-art performance on benchmark datasets.

An Evaluation of Learning Graph Convolutional Network for Skeleton-based Human Action Recognition via Neural Searching

The paper "Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching," by Wei Peng et al., explores an innovative method for skeleton-based human action recognition. This research aims to overcome current constraints in the application of Graph Convolutional Networks (GCNs) to non-Euclidean data structures by employing an Automatic Neural Architecture Search (NAS) approach to optimize GCN architecture specifically for this task.

Core Contributions

Dynamic Graph Modeling: The researchers challenge the prevalent approach of using fixed pre-defined graphs in GCNs, which often fall short in capturing implicit joint correlations crucial for recognizing nuanced human actions. The authors address this limitation by promoting a dynamic graph modeling strategy. The proposed method leverages multiple spatial-temporal graph modules, exploring how nodes interact over time and space to adjust their connectivity dynamically throughout the network.
Higher-Order Connections: Traditional GCNs often utilize a first-order Chebyshev polynomial approximation to reduce computational overhead, which restricts the network's representational capacity. This work innovates by incorporating multi-hop modules that allow the model to capture higher-order spectral connections, expanding the GCN's receptive field, and thereby enhancing its ability to learn complex spatial relationships within the data.
Neural Architecture Search (NAS): Implementation of a NAS framework specifically tailored for GCN optimization is a pivotal aspect of this research. The authors utilize an evolutionary strategy that combines sampling efficiency with memory conservation to navigate the space of potential architectures more effectively. This automated process mitigates the need for labor-intensive manual design and yields architectures adept at handling the intricacies of skeleton-based action data.

Experimental Evaluation

The research evaluates the proposed architecture on two expansive datasets: NTU RGB+D and Kinetics-Skeleton, achieving state-of-the-art results. This underscores the effectiveness of the dynamic graph construction and higher-order connectivity approaches in accurately capturing the complex dynamics of human action sequences. The experiments involve comprehensive testing on both cross-subject and cross-view evaluation protocols, revealing the robustness of the searched architecture across different data variations.

Implications and Future Directions

This paper's findings have significant implications for the future development of GCNs in action recognition and similar tasks involving non-Euclidean data. By automating the architecture design process through NAS, researchers and practitioners can reduce development time and resources while enhancing model performance. The introduction of temporal dynamics and higher-order connections suggests new avenues for improving other GCN applications, such as social network analysis and molecular graph interpretation.

Looking forward, extending this methodology to incorporate joint optimization of temporal cues and spatial dependencies offers a potential trajectory for future research. Further exploration into more sophisticated NAS strategies could uncover even more optimal configurations, tailored to various non-Euclidean datasets across domains.

In conclusion, this paper propels the understanding and application of GCNs for action recognition by introducing a systematic approach to automatically and dynamically optimize graph structures, which could influence a broad spectrum of future research in graph-based learning paradigms.

PDF Markdown