Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition (2107.12213v2)

Published 26 Jul 2021 in cs.CV

Abstract: Graph convolutional networks (GCNs) have been widely used and achieved remarkable results in skeleton-based action recognition. In GCNs, graph topology dominates feature aggregation and therefore is the key to extracting representative features. In this work, we propose a novel Channel-wise Topology Refinement Graph Convolution (CTR-GC) to dynamically learn different topologies and effectively aggregate joint features in different channels for skeleton-based action recognition. The proposed CTR-GC models channel-wise topologies through learning a shared topology as a generic prior for all channels and refining it with channel-specific correlations for each channel. Our refinement method introduces few extra parameters and significantly reduces the difficulty of modeling channel-wise topologies. Furthermore, via reformulating graph convolutions into a unified form, we find that CTR-GC relaxes strict constraints of graph convolutions, leading to stronger representation capability. Combining CTR-GC with temporal modeling modules, we develop a powerful graph convolutional network named CTR-GCN which notably outperforms state-of-the-art methods on the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.

Citations (484)

View on Semantic Scholar

Summary

The paper introduces CTR-GC, which dynamically refines graph topologies on a per-channel basis to improve skeleton-based action recognition.
It unifies existing graph convolutions into a single mathematical formulation, relaxing rigid constraints and boosting model flexibility.
Experimental results show CTR-GCN achieving 1.6% and 2.0% accuracy gains on NTU datasets, evidencing its superior performance against state-of-the-art methods.

Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition

The paper "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition" introduces a novel graph convolution technique referred to as Channel-wise Topology Refinement Graph Convolution (CTR-GC). This innovation is specifically targeted at enhancing skeleton-based action recognition by dynamically refining graph topologies at the channel level.

Context and Methodology

Graph Convolutional Networks (GCNs) have shown effectiveness in skeleton-based action recognition tasks, where the topology of the graph is fundamental in aggregating features and extracting meaningful representations. Traditional methods, such as those employing static or shared topologies, limit the adaptability of these networks, particularly when considering the varied types of motion features observed in different channels.

The proposed CTR-GC approach addresses this limitation by learning specific topologies for different channels. It achieves this by coalescing a shared, parameterized topology—serving as a generic prior—with dynamically inferred channel-specific correlations. This integration is facilitated with minimal additional parameters, thereby mitigating complexity in modeling diverse channel-wise topologies.

Technical Contributions

Topology Refinement: CTR-GC refines a shared topology with dynamic, per-channel specific correlations, leading to a more flexible and expressive feature representation. This significantly enhances the model’s ability to discern subtle distinctions in joint movements across different actions.
Unified Graph Convolutions Formulation: The paper reformulates various existing graph convolutions into a unified mathematical framework. This reformulation highlights how CTR-GC relaxes certain rigid constraints inherent in other methods, thereby offering superior representation capabilities.
Experimental Validation: The efficacy of CTR-GC is demonstrated through extensive experiments on NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets. Notably, CTR-GCN—a network leveraging CTR-GC—achieves superior performance against state-of-the-art methods across these benchmarks.

Numerical Results and Implications

The introduction of CTR-GC has led to significant performance improvements, with CTR-GCN achieving accuracy gains of 1.6% and 2.0% on NTU RGB+D 120 when combining joint-bone modalities. These results illustrate the capability of the refined graph convolution to better capture complex motion patterns inherent in skeleton-based action sequences.

Implications and Future Directions

The development of CTR-GC opens several avenues for enhancing action recognition. Practically, this approach can be applied to improve systems in surveillance, human-computer interaction, and robotics. Theoretically, the flexibility of CTR-GC in adapting topologies dynamically per channel suggests potential applications in other graph-based domains beyond action recognition.

Future research could explore integrating CTR-GC with other innovative neural architectures or extending its use to different types of non-Euclidean data. Additionally, optimizing the computational efficiency and further reducing parameter overhead could make CTR-GC viable for real-time applications.

In conclusion, the CTR-GC presents a substantial advancement in the field of skeleton-based action recognition, providing a robust framework that combines interpretability and performance, setting a new standard for future graph-based methodologies.

PDF Markdown