Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition (2403.15212v2)

Published 22 Mar 2024 in cs.CV

Abstract: Skeleton-based action recognition (SAR) in videos is an important but challenging task in computer vision. The recent state-of-the-art (SOTA) models for SAR are primarily based on graph convolutional neural networks (GCNs), which are powerful in extracting the spatial information of skeleton data. However, it is yet clear that such GCN-based models can effectively capture the temporal dynamics of human action sequences. To this end, we propose the G-Dev layer, which exploits the path development -- a principled and parsimonious representation for sequential data by leveraging the Lie group structure. By integrating the G-Dev layer, the hybrid G-DevLSTM module enhances the traditional LSTM to reduce the time dimension while retaining high-frequency information. It can be conveniently applied to any temporal graph data, complementing existing advanced GCN-based models. Our empirical studies on the NTU60, NTU120 and Chalearn2013 datasets demonstrate that our proposed GCN-DevLSTM network consistently improves the strong GCN baseline models and achieves SOTA results with superior robustness in SAR tasks. The code is available at https://github.com/DeepIntoStreams/GCN-DevLSTM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Human action recognition in unconstrained trimmed videos using residual attention network and joints path signature. IEEE Access, 7:121212–121222, 2019.
  2. The signature of a rough path: uniqueness. Advances in Mathematics, 293:720–737, 2016.
  3. Léon Bottou. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade: Second Edition, pages 421–436. Springer, 2012.
  4. Realtime multi-person 2d pose estimation using part affinity fields. In IEEE Conference on Computer Vision and Pattern Recognition, pages 7291–7299, 2017.
  5. Channel-wise topology refinement graph convolution for skeleton-based action recognition. In IEEE International Conference on Computer Vision, pages 13359–13368, 2021.
  6. Kuo-Tsai Chen. Integration of paths–a faithful representation of paths by noncommutative formal power series. Transactions of the American Mathematical Society, 89(2):395–407, 1958.
  7. Decoupling gcn with dropgraph module for skeleton-based action recognition. In European Conference on Computer Vision, pages 536–553, 2020.
  8. Skeleton-based action recognition with shift graph convolutional network. In IEEE Conference on Computer Vision and Pattern Recognition, pages 183–192, 2020.
  9. Skeleton-based gesture recognition with learnable paths and signature features. IEEE Transactions on Multimedia, 2023.
  10. P-cnn: Pose-based cnn features for action recognition. In IEEE International Conference on Computer Vision, pages 3218–3226, 2015.
  11. Characteristic functions of measures on geometric rough paths. Annals of probability: An official journal of the Institute of Mathematical Statistics, 44(6):4049–4082, 2016.
  12. BRUCE K Driver. A primer on riemannian geometry and stochastic analysis on path spaces. 1995.
  13. Hierarchical recurrent neural network for skeleton based action recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1110–1118, 2015.
  14. Chalearn multi-modal gesture recognition 2013: grand challenge and workshop summary. In 15th ACM on International conference on multimodal interaction, pages 365–368, 2013.
  15. Uniqueness for the signature of a path of bounded variation and the reduced path group. Annals of Mathematics, pages 109–167, 2010.
  16. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
  17. Long short-term memory. Neural Comput., 9(8):1735–1780, 1997.
  18. Neural controlled differential equations for irregular time series. Advances in Neural Information Processing Systems, 33:6696–6707, 2020.
  19. Skeleton-based gesture recognition using several fully connected layers with path signature features and temporal transformer module. In AAAI Conference on Artificial Intelligence, volume 33, pages 8585–8593, 2019.
  20. Actional-structural graph convolutional networks for skeleton-based action recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3595–3603, 2019.
  21. Multi-path convolutional neural network based on rectangular kernel with path signature features for gesture recognition. In IEEE Visual Communications and Image Processing (VCIP), pages 1–4. IEEE, 2019.
  22. Logsig-rnn: a novel network for robust and efficient skeleton-based action recognition. British Machine Vision Conference, 2021.
  23. Global context-aware attention lstm networks for 3d action recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1647–1656, 2017.
  24. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition, 68:346–362, 2017.
  25. Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):3007–3021, 2018.
  26. Rgb-d sensing based human action and interaction analysis: A survey. Pattern Recognition, 94:1–12, 2019.
  27. Ntu rgb+d 120: A large-scale benchmark for 3d human activity understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(10):2684–2701, 2019.
  28. Disentangling and unifying graph convolutions for skeleton-based action recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 143–152, 2020.
  29. Path development network with finite-dimensional lie group representation. arXiv preprint arXiv:2204.00740, 2022.
  30. Pcf-gan: generating sequential data via the characteristic function of measures on the path space. arXiv preprint arXiv:2305.12511, 2023.
  31. Hyperbolic development and inversion of signature. Journal of Functional Analysis, 272(7):2933–2955, 2017.
  32. A feature set for streams and an application to high-frequency financial tick data. In 2014 International Conference on Big Data Science and Computing, pages 1–8, 2014.
  33. Terry J Lyons. Differential equations driven by rough signals. Revista Matemática Iberoamericana, 14(2):215–310, 1998.
  34. Sig-wasserstein gans for time series generation. In Second ACM International Conference on AI in Finance, pages 1–8, 2021.
  35. Ntu rgb+d: A large scale dataset for 3d human activity analysis. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1010–1019, 2016.
  36. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 12026–12035, 2019.
  37. Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Transactions on Image Processing, 29:9532–9545, 2020.
  38. An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In IEEE International Conference on Computer Vision, pages 1227–1236, 2019.
  39. Stronger, faster and more explainable: A graph convolutional baseline for skeleton-based action recognition. In 28th ACM International Conference on Multimedia, pages 1625–1633, 2020.
  40. Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2):1474–1488, 2022.
  41. Learning spatial-semantic context with fully convolutional recurrent network for online handwritten chinese text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(8):1903–1917, 2017.
  42. Topology-aware convolutional neural network for efficient skeleton-based action recognition. In AAAI Conference on Artificial Intelligence, volume 36, pages 2866–2874, 2022.
  43. Spatial temporal graph convolutional networks for skeleton-based action recognition. In AAAI Conference on Artificial Intelligence, volume 32, 2018.
  44. Chinese character-level writer identification using path signature feature, dropstroke and deep cnn. In 13th International Conference on Document Analysis and Recognition (ICDAR), pages 546–550. IEEE, 2015.
  45. Developing the path signature methodology and its application to landmark-based human action recognition. In Stochastic Analysis, Filtering, and Stochastic Optimization: A Commemorative Volume to Honor Mark HA Davis’s Contributions, pages 431–464. Springer, 2022.
  46. Bayesian graph convolution lstm for skeleton based action recognition. In IEEE International Conference on Computer Vision, pages 6882–6892, 2019.
  47. Learning discriminative representations for skeleton based action recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 10608–10617, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.