Space-Time-Separable Graph Convolutional Network for Pose Forecasting (2110.04573v1)

Published 9 Oct 2021 in cs.CV

Abstract: Human pose forecasting is a complex structured-data sequence-modelling task, which has received increasing attention, also due to numerous potential applications. Research has mainly addressed the temporal dimension as time series and the interaction of human body joints with a kinematic tree or by a graph. This has decoupled the two aspects and leveraged progress from the relevant fields, but it has also limited the understanding of the complex structural joint spatio-temporal dynamics of the human pose. Here we propose a novel Space-Time-Separable Graph Convolutional Network (STS-GCN) for pose forecasting. For the first time, STS-GCN models the human pose dynamics only with a graph convolutional network (GCN), including the temporal evolution and the spatial joint interaction within a single-graph framework, which allows the cross-talk of motion and spatial correlations. Concurrently, STS-GCN is the first space-time-separable GCN: the space-time graph connectivity is factored into space and time affinity matrices, which bottlenecks the space-time cross-talk, while enabling full joint-joint and time-time correlations. Both affinity matrices are learnt end-to-end, which results in connections substantially deviating from the standard kinematic tree and the linear-time time series. In experimental evaluation on three complex, recent and large-scale benchmarks, Human3.6M [Ionescu et al. TPAMI'14], AMASS [Mahmood et al. ICCV'19] and 3DPW [Von Marcard et al. ECCV'18], STS-GCN outperforms the state-of-the-art, surpassing the current best technique [Mao et al. ECCV'20] by over 32% in average at the most difficult long-term predictions, while only requiring 1.7% of its parameters. We explain the results qualitatively and illustrate the graph interactions by the factored joint-joint and time-time learnt graph connections. Our source code is available at: https://github.com/FraLuca/STSGCN

Citations (131)

View on Semantic Scholar

Summary

The paper presents a novel STS-GCN that integrates spatial and temporal dynamics for improved human pose forecasting.
It employs learnable space and time affinity matrices to effectively model joint and temporal interactions in a unified framework.
It outperforms state-of-the-art methods with over 32% improvement in long-term predictions while using only 1.7% of the parameters.

A Comprehensive Review of the Space-Time-Separable Graph Convolutional Network for Pose Forecasting

The paper "Space-Time-Separable Graph Convolutional Network for Pose Forecasting" introduces a novel approach to human pose forecasting by proposing a unique use of Graph Convolutional Networks (GCN) that are space-time-separable. Human pose forecasting is inherently complex due to the structured nature of temporal and spatial data, where understanding the intricacies of how joints interact over time and space is vital. This work presents an advancement in modelling these dynamics through the integration of both temporal evolution and spatial joint interactions in a singular framework.

The primary contribution of this paper is the introduction of the Space-Time-Separable Graph Convolutional Network (STS-GCN). Unlike traditional approaches that treat time and space as separate entities for modelling purposes, the STS-GCN employs a space-time graph connectivity factored into separate space and time affinity matrices. This factorization is key—it bottlenecks space-time interaction (known as cross-talk) while allowing comprehensive modelling of joint-joint and time-time correlations. Both affinity matrices are learned end-to-end, allowing for adaptive modelling based on data-driven insights, which is a significant divergence from the conventional kinematic tree or linear-time series models.

The results presented in the paper are compelling. When tested on large-scale datasets such as Human3.6M, AMASS, and 3DPW, the STS-GCN consistently surpasses the state-of-the-art techniques, notably outperforming the previously best method by over 32% in long-term prediction scenarios while only utilizing 1.7% of the parameters. This efficiency highlights the effectiveness of the proposed space-time-separable framework, not only in terms of performance but also computational efficiency.

The implications of this research are significant. Practically, this model can be utilized in multiple domains such as autonomous systems where human pose forecasting is critical, aiding in tasks from pedestrian detection to robotic path planning around humans. From a theoretical perspective, the paper opens avenues for the development of further optimized neural network models that leverage separable dimensions for learning complex data interactions, potentially influencing future research in other structured sequence-modelling tasks beyond human motion forecasting.

In terms of future developments, this work paves the way for advancements in graph-based modelling for other complex systems where space-time interactions are critical. Succeeding research could potentially expand upon this model by integrating additional contextual information or extending this framework to perform in real-time applications, thereby expanding its utility and applicability.

Overall, "Space-Time-Separable Graph Convolutional Network for Pose Forecasting" constitutes a robust contribution to the field of human pose forecasting. By innovatively bridging spatial and temporal data within a singular, efficient learning framework, it marks a valuable step toward more sophisticated and accurate predictive models in structured data contexts.

PDF Markdown

Related Papers

GitHub

GitHub - FraLuca/STSGCN: Repository for "Space-Time-Separable Graph Convolutional Network for Pose Forecasting" (ICCV 2021) (92 stars)