Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

STGFormer: Spatio-Temporal GraphFormer for 3D Human Pose Estimation in Video (2407.10099v1)

Published 14 Jul 2024 in cs.CV

Abstract: The current methods of video-based 3D human pose estimation have achieved significant progress; however, they continue to confront the significant challenge of depth ambiguity. To address this limitation, this paper presents the spatio-temporal GraphFormer framework for 3D human pose estimation in video, which integrates body structure graph-based representations with spatio-temporal information. Specifically, we develop a spatio-temporal criss-cross graph (STG) attention mechanism. This approach is designed to learn the long-range dependencies in data across both time and space, integrating graph information directly into the respective attention layers. Furthermore, we introduce the dual-path modulated hop-wise regular GCN (MHR-GCN) module, which utilizes modulation to optimize parameter usage and employs spatio-temporal hop-wise skip connections to acquire higher-order information. Additionally, this module processes temporal and spatial dimensions independently to learn their respective features while avoiding mutual influence. Finally, we demonstrate that our method achieves state-of-the-art performance in 3D human pose estimation on the Human3.6M and MPI-INF-3DHP datasets.

Summary

We haven't generated a summary for this paper yet.