Joint Temporal Pooling for Improving Skeleton-based Action Recognition
Abstract: In skeleton-based human action recognition, temporal pooling is a critical step for capturing spatiotemporal relationship of joint dynamics. Conventional pooling methods overlook the preservation of motion information and treat each frame equally. However, in an action sequence, only a few segments of frames carry discriminative information related to the action. This paper presents a novel Joint Motion Adaptive Temporal Pooling (JMAP) method for improving skeleton-based action recognition. Two variants of JMAP, frame-wise pooling and joint-wise pooling, are introduced. The efficacy of JMAP has been validated through experiments on the popular NTU RGB+D 120 and PKU-MMD datasets.
- Channel-wise topology refinement graph convolution for skeleton-based action recognition. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 13339–13348, 2021.
- Independently recurrent neural network (indrnn): Building a longer and deeper rnn. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5457–5466, 2018.
- Spatial temporal graph convolutional networks for skeleton-based action recognition, 2018.
- Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
- Hierarchical recurrent neural network for skeleton based action recognition. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1110–1118, 2015.
- Skeleton-based action recognition with spatial reasoning and temporal stack learning. In Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, editors, Computer Vision – ECCV 2018, pages 106–121, Cham, 2018. Springer International Publishing.
- Skeleton-based action recognition with shift graph convolutional network. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 180–189, 2020.
- Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 143–152, 2020.
- Mgsampler: An explainable sampling strategy for video action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1513–1522, October 2021.
- A learnable motion preserving pooling for fine-grained video classification. In Available at SSRN: https://ssrn.com/abstract=4204770 or http://dx.doi.org/10.2139/ssrn.4204770, 2022.
- Semi-supervised classification with graph convolutional networks. In arXiv preprint arXiv:1609.02907, 2017.
- Rgb-d-based human motion recognition with deep learning: A survey, 2018.
- Depth pooling based large-scale 3d action recognition with convolutional neural networks, 2018.
- Skeleton-based action recognition with directed graph neural networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, jun 2019.
- Spatial residual layer and dense connection block enhanced spatial temporal graph convolutional network for skeleton-based action recognition. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 1740–1748, 2019.
- Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12018–12027, 2019.
- A central difference graph convolutional operator for skeleton-based action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 32(7):4893–4899, 2022.
- Improved shift graph convolutional network for action recognition with skeleton. IEEE Signal Processing Letters, 30:438–442, 2023.
- Human action recognition using factorized spatio-temporal convolutional networks. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, dec 2015.
- Selective feature compression for efficient activity recognition inference. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 13608–13617, 2021.
- Vidtr: Video transformer without convolutions. 1357.
- Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision, pages 20–36. Springer, 2016.
- Adaframe: Adaptive frame selection for fast video recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
- Scsampler: Sampling salient clips from video for efficient action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
- Ntu rgb+d 120: A large-scale benchmark for 3d human activity understanding. In IEEE Transactions on Pattern Analysis and Machine Intelligence. Institute of Electrical and Electronics Engineers (IEEE), 2020.
- Pku-mmd: A large scale benchmark for continuous multi-modal human action understanding. In arXiv preprint arXiv:1703.07475, 2017.
- Multimodal fusion via teacher-student network for indoor action recognition. In The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), 3199.
- Semantics-guided neural networks for efficient skeleton-based human action recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Making the invisible visible: Action recognition through walls and occlusions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
- Skeleton based action recognition with convolutional neural network. In 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pages 579–583. IEEE, 2015.
- Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. AAAI Press, 2018.
- Spatial temporal graph deconvolutional network for skeleton-based human action recognition. IEEE Signal Processing Letters, 28:244–248, 2021.
- GAS-GCN: Gated action-specific graph convolutional networks for skeleton-based action recognition. Sensors, 20(12):3499, jun 2020.
- Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. AAAI, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.