Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition (2404.02624v1)

Published 3 Apr 2024 in cs.CV

Abstract: Skeleton-based gesture recognition methods have achieved high success using Graph Convolutional Network (GCN). In addition, context-dependent adaptive topology as a neighborhood vertex information and attention mechanism leverages a model to better represent actions. In this paper, we propose self-attention GCN hybrid model, Multi-Scale Spatial-Temporal self-attention (MSST)-GCN to effectively improve modeling ability to achieve state-of-the-art results on several datasets. We utilize spatial self-attention module with adaptive topology to understand intra-frame interactions within a frame among different body parts, and temporal self-attention module to examine correlations between frames of a node. These two are followed by multi-scale convolution network with dilations, which not only captures the long-range temporal dependencies of joints but also the long-range spatial dependencies (i.e., long-distance dependencies) of node temporal behaviors. They are combined into high-level spatial-temporal representations and output the predicted action with the softmax classifier.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Thomas N. Kipf and MaxWelling. Semi-supervised classification with graph convolutional networks. In ICLR (Poster), 2016.
  2. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-second AAAI Conference on Artificial Intelligence, 2018.
  3. Disentangling and unifying graph convolutions for skeleton-based action recognition. pages 143––152, 2020.
  4. Infogcn: Representation learning for human skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20186––20196, June 2022.
  5. M. Cannici C. Plizzari and M. Matteucci. Skeleton-based action recognition via spatial and temporal transformer networks. Computer Vision and Image Understanding, pages 208–209, 2021.
  6. Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, 35, 2021.
  7. Shrec’17 track: 3d hand gesture recognition using a depth and skeletal dataset. in 3DOR-10th Eurographics Workshop on 3D Object Retrieval, pages 1–6, 2017.
  8. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, pages 1010–1019, 27-30 June 2016.
  9. Cross-view action modeling, learning and recognition. In Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition, pages 2649–2656, 2014.
  10. Jian Cheng Lei Shi, Yifan Zhang and Hanqing Lu. Two stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12026–12035, 2019.
  11. Channel-wise topology refinement graph convolution for skeleton-based action recognition. IEEE International Conference on Computer Vision (ICCV), pages 13359–13368, 2021.
  12. Jian Cheng Lei Shi, Yifan Zhang and Hanqing Lu. Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. https://arxiv.org/abs/1912.06971, 2019.
  13. Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  14. Hierarchically decomposed graph convolutional networks for skeleton-based action recognition. https://arxiv.org/abs/2208.10741, 2023.
  15. Leveraging spatio-temporal dependency for skeleton-based action recognition. https://arxiv.org/abs/2212.04761, 2023.
  16. Jian Cheng Lei Shi, Yifan Zhang and Hanqing Lu. Gdecoupled spatial-temporal attention network for skeleton-based action recognition. https://arxiv.org/abs/2007.03263, 2020.
  17. Stst: Spatialtemporal specialized transformer for skeleton-based action recognition. ACM International Conference on Multimedia (ACM MM), pages 3229–3237, 2021.
  18. Hypergraph transformer for skeleton-based action recognition. https://arxiv.org/abs/2211.09590, 2022.
  19. Nguyen Huu Bao Long. Step catformer: Spatial-temporal effective body-part cross attention transformer for skeleton-based action recognition. https://arxiv.org/abs/22312.03288, 2023.
  20. Focal and global spatial-temporal transformer for skeleton-based action recognition. https://arxiv.org/abs/2210.02693, 2022.
  21. Language knowledge-assisted representation learning for skeleton-based action recognition. https://arxiv.org/abs/2305.12398, 2023.
  22. Going deeper with convolutions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
  23. Temporal decoupling graph convolutional network for skeleton-based gesture recognition. IEEE Transactions on Multimedia, pages 811–823, 2023.
  24. Dynamic hand gesture recognition using improved spatio-temporal graph convolutional network. IEEE Transactions on Circuits and Systems for Video Technology, 32, no.9, 2022.
  25. Kai Chen Haodong Duan, Jiaqi Wang and Dahua Lin. Dg-stgcn: Dynamic spatial-temporal modeling for skeleton-based action recognition. https://arxiv.org/abs/2210.05895, 2022.
  26. Graph contrastive learning for skeleton-based action recognition. https://arxiv.org/abs/2301.10900, 2023.

Summary

We haven't generated a summary for this paper yet.