Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards a geometric understanding of Spatio Temporal Graph Convolution Networks (2312.07777v1)

Published 12 Dec 2023 in eess.IV and eess.SP

Abstract: Spatiotemporal graph convolutional networks (STGCNs) have emerged as a desirable model for skeleton-based human action recognition. Despite achieving state-of-the-art performance, there is a limited understanding of the representations learned by these models, which hinders their application in critical and real-world settings. While layerwise analysis of CNN models has been studied in the literature, to the best of our knowledge, there exists no study on the layerwise explainability of the embeddings learned on spatiotemporal data using STGCNs. In this paper, we first propose to use a local Dataset Graph (DS-Graph) obtained from the feature representation of input data at each layer to develop an understanding of the layer-wise embedding geometry of the STGCN. To do so, we develop a window-based dynamic time warping (DTW) method to compute the distance between data sequences with varying temporal lengths. To validate our findings, we have developed a layer-specific Spatiotemporal Graph Gradient-weighted Class Activation Mapping (L-STG-GradCAM) technique tailored for spatiotemporal data. This approach enables us to visually analyze and interpret each layer within the STGCN network. We characterize the functions learned by each layer of the STGCN using the label smoothness of the representation and visualize them using our L-STG-GradCAM approach. Our proposed method is generic and can yield valuable insights for STGCN architectures in different applications. However, this paper focuses on the human activity recognition task as a representative application. Our experiments show that STGCN models learn representations that capture general human motion in their initial layers while discriminating different actions only in later layers. This justifies experimental observations showing that fine-tuning deeper layers works well for transfer between related tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  2. Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A comprehensive survey on graph neural networks,” IEEE transactions on neural networks and learning systems, vol. 32, no. 1, pp. 4–24, 2020.
  3. F. Wu, A. Souza, T. Zhang, C. Fifty, T. Yu, and K. Weinberger, “Simplifying graph convolutional networks,” in International conference on machine learning.   PMLR, 2019, pp. 6861–6871.
  4. T.-A. N. Pham, X. Li, G. Cong, and Z. Zhang, “A general graph-based model for recommendation in event-based social networks,” in 2015 IEEE 31st international conference on data engineering.   IEEE, 2015, pp. 567–578.
  5. J.-Y. Kao, A. Ortega, D. Tian, H. Mansour, and A. Vetro, “Graph based skeleton modeling for human activity analysis,” in 2019 IEEE International Conference on Image Processing (ICIP).   IEEE, 2019, pp. 2025–2029.
  6. B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting,” arXiv preprint arXiv:1709.04875, 2017.
  7. B. Ren, M. Liu, R. Ding, and H. Liu, “A survey on 3d skeleton-based action recognition using learning method,” arXiv preprint arXiv:2002.05907, 2020.
  8. B. Ghorbani, S. Krishnan, and Y. Xiao, “An investigation into neural net optimization via hessian eigenvalue density,” in International Conference on Machine Learning.   PMLR, 2019, pp. 2232–2241.
  9. S. Arora, S. Du, W. Hu, Z. Li, and R. Wang, “Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks,” in International Conference on Machine Learning.   PMLR, 2019, pp. 322–332.
  10. S. Gunasekar, J. D. Lee, D. Soudry, and N. Srebro, “Implicit bias of gradient descent on linear convolutional networks,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  11. R. Baldock, H. Maennel, and B. Neyshabur, “Deep learning through the lens of example difficulty,” Advances in Neural Information Processing Systems, vol. 34, pp. 10 876–10 889, 2021.
  12. S. Shekkizhar and A. Ortega, “Model selection and explainability in neural networks using a polytope interpolation framework,” in 2021 55th Asilomar Conference on Signals, Systems, and Computers.   IEEE, 2021, pp. 177–181.
  13. R. Cosentino, S. Shekkizhar, S. Avestimehr, M. Soltanolkotabi, and A. Ortega, “The geometry of self-supervised learning models and its impact on transfer learning,” 2022.
  14. Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE multimedia, vol. 19, no. 2, pp. 4–10, 2012.
  15. C. Li, Q. Zhong, D. Xie, and S. Pu, “Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation,” arXiv preprint arXiv:1804.06055, 2018.
  16. Y. Yan, J. Xu, B. Ni, W. Zhang, and X. Yang, “Skeleton-aided articulated motion generation,” in Proceedings of the 25th ACM international conference on Multimedia, 2017, pp. 199–207.
  17. M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang, and Q. Tian, “Actional-structural graph convolutional networks for skeleton-based action recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3595–3603.
  18. Z. Liu, H. Zhang, Z. Chen, Z. Wang, and W. Ouyang, “Disentangling and unifying graph convolutions for skeleton-based action recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 143–152.
  19. P. Das and A. Ortega, “Symmetric sub-graph spatio-temporal graph convolution and its application in complex activity recognition,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2021, pp. 3215–3219.
  20. C. Pan, S. Chen, and A. Ortega, “Spatio-temporal graph scattering transform,” arXiv preprint arXiv:2012.03363, 2020.
  21. S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Thirty-second AAAI conference on artificial intelligence, 2018.
  22. D. Bonet, A. Ortega, J. Ruiz-Hidalgo, and S. Shekkizhar, “Channel redundancy and overlap in convolutional neural networks with channel-wise nnk graphs,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2022.
  23. K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” arXiv preprint arXiv:1312.6034, 2013.
  24. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618–626.
  25. P. Das and A. Ortega, “Gradient-weighted class activation mapping for spatio temporal graph convolutional network,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2022, pp. 4043–4047.
  26. S. Shekkizhar and A. Ortega, “Graph construction from data using non negative kernel regression (nnk graphs),” arXiv preprint arXiv:1910.09383, 2019.
  27. M. Müller, “Dynamic time warping,” Information retrieval for music and motion, pp. 69–84, 2007.
  28. A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “NTU RGB+D: A large scale dataset for 3d human activity analysis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1010–1019.
  29. J. Liu, A. Shahroudy, M. Perez, G. Wang, L.-Y. Duan, and A. C. Kot, “Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 10, pp. 2684–2701, 2019.
  30. S. Shekkizhar and A. Ortega, “Graph construction from data by non-negative kernel regression,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2020, pp. 3892–3896.
  31. ——, “Revisiting local neighborhood methods in machine learning,” in Data Science and Learning Workshop (DSLW).   IEEE, 2021.
  32. J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with gpus,” IEEE Trans. on Big Data, 2019.
  33. G. Ongie, R. Willett, D. Soudry, and N. Srebro, “A function space view of bounded norm infinite width ReLU nets: The multivariate case,” in International Conference on Learning Representations, 2019.
  34. X. Zhu, Z. Ghahramani, and J. D. Lafferty, “Semi-supervised learning using gaussian fields and harmonic functions,” in Proceedings of the 20th International conference on Machine learning (ICML-03), 2003, pp. 912–919.
  35. M. Fazlyab, M. Morari, and G. J. Pappas, “Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming,” IEEE Transactions on Automatic Control, 2020.
  36. M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, “Efficient and accurate estimation of lipschitz constants for deep neural networks,” Advances in Neural Information Processing Systems, 2019.
  37. V. Gripon, A. Ortega, and B. Girault, “An inside look at deep neural networks using graph signal processing,” in 2018 Information Theory and Applications Workshop (ITA).   IEEE, 2018, pp. 1–9.
  38. C. Lassance, V. Gripon, and A. Ortega, “Representing deep neural networks latent space geometries with graphs,” Algorithms, vol. 14, no. 2, p. 39, 2021.
  39. S. Masood, M. P. Qureshi, M. B. Shah, S. Ashraf, Z. Halim, and G. Abbas, “Dynamic time wrapping based gesture recognition,” in 2014 International Conference on Robotics and Emerging Allied Technologies in Engineering (iCREATE).   IEEE, 2014, pp. 205–210.
  40. K. Cheng, Y. Zhang, X. He, W. Chen, J. Cheng, and H. Lu, “Skeleton-based action recognition with shift graph convolutional network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  41. W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev et al., “The kinetics human action video dataset,” arXiv preprint arXiv:1705.06950, 2017.
  42. D. Osokin, “Real-time 2d multi-person pose estimation on cpu: Lightweight openpose,” arXiv preprint arXiv:1811.12004, 2018.

Summary

We haven't generated a summary for this paper yet.