Semantic Graph Convolutional Networks for 3D Human Pose Regression

Published 6 Apr 2019 in cs.CV | (1904.03345v3)

Abstract: In this paper, we study the problem of learning Graph Convolutional Networks (GCNs) for regression. Current architectures of GCNs are limited to the small receptive field of convolution filters and shared transformation matrix for each node. To address these limitations, we propose Semantic Graph Convolutional Networks (SemGCN), a novel neural network architecture that operates on regression tasks with graph-structured data. SemGCN learns to capture semantic information such as local and global node relationships, which is not explicitly represented in the graph. These semantic relationships can be learned through end-to-end training from the ground truth without additional supervision or hand-crafted rules. We further investigate applying SemGCN to 3D human pose regression. Our formulation is intuitive and sufficient since both 2D and 3D human poses can be represented as a structured graph encoding the relationships between joints in the skeleton of a human body. We carry out comprehensive studies to validate our method. The results prove that SemGCN outperforms state of the art while using 90% fewer parameters.

Abstract PDF Upgrade to Chat

Citations (483)

View on Semantic Scholar

Summary

The paper presents SemGCN, a model that enhances graph convolutions by integrating semantic edge weights for efficient 3D pose regression.
It employs channel-wise weighting to capture local and global node relationships, reducing parameters by up to 90% compared to existing methods.
The framework integrates image features to further improve accuracy, outperforming state-of-the-art approaches on multiple 3D human pose benchmarks.

Overview of Semantic Graph Convolutional Networks for 3D Human Pose Regression

The paper presents a novel architecture, Semantic Graph Convolutional Networks (SemGCN), targeting regression tasks with graph-structured data, notably applied to 3D human pose regression.

Key Contributions

The authors identify limitations in existing GCN architectures, such as restricted receptive fields and uniform transformation matrices for nodes. To address these issues, SemGCN incorporates semantic information, learning both local and global node relationships through end-to-end training without requiring additional supervision.

Semantic Graph Convolution

SemGCN introduces Semantic Graph Convolution (SemGConv), which enhances typical graph convolutions by implementing channel-wise weights for edges. This mechanism combines local semantic relationships within graphs, thereby improving the expressiveness and capability of graph convolutions. The architecture interleaves SemGConv and non-local layers to capture comprehensive node relationships.

Application to 3D Human Pose Regression

The application of SemGCN to 3D human pose regression is intuitive. Human poses can be represented as structured graphs, encoding the connections between skeleton joints. SemGCN effectively predicts 3D joint positions from 2D joint data using these graph representations. The framework further allows incorporating image features, improving pose estimation by incorporating perceptual cues from images.

Numerical Results

The paper reports SemGCN outperforming state-of-the-art methods on several benchmarks. It achieves competitive accuracy with significantly fewer parameters—90% fewer than related works. This architecture also outperforms peers when image content is included, highlighting its robust capability and scalability.

Theoretical and Practical Implications

Theoretically, SemGCN advances graph-based learning paradigms by efficiently modeling semantic relationships in graph-structured data. Practically, the potential for generalization to other regression tasks makes it a versatile tool in computer vision and beyond.

Future Directions

Future research could explore the extension of SemGCN to temporal data for video analysis or integrate it with sequence-based models. Investigating its application in diverse domains like social networks or molecular biology could further demonstrate its utility.

This paper makes significant strides in addressing limitations of existing GCN frameworks for regression tasks, offering a methodologically sound and computationally efficient approach to human pose estimation in 3D space.

Markdown