Learning Scene Flow With Skeleton Guidance For 3D Action Recognition
Abstract: Among the existing modalities for 3D action recognition, 3D flow has been poorly examined, although conveying rich motion information cues for human actions. Presumably, its susceptibility to noise renders it intractable, thus challenging the learning process within deep models. This work demonstrates the use of 3D flow sequence by a deep spatiotemporal model and further proposes an incremental two-level spatial attention mechanism, guided from skeleton domain, for emphasizing motion features close to the body joint areas and according to their informativeness. Towards this end, an extended deep skeleton model is also introduced to learn the most discriminant action motion dynamics, so as to estimate an informativeness score for each joint. Subsequently, a late fusion scheme is adopted between the two models for learning the high level cross-modal correlations. Experimental results on the currently largest and most challenging dataset NTU RGB+D, demonstrate the effectiveness of the proposed approach, achieving state-of-the-art results.
- Human activity analysis: A review. ACM Computing Surveys (CSUR) 43(3) (2011) Â 16
- Human activity recognition from 3d data: A review. Pattern Recognition Letters 48 (2014) 70–80
- Hierarchical recurrent neural network for skeleton based action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (June 2015)
- Human action recognition by representing 3d skeletons as points in a lie group. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (June 2014)
- Spatio-temporal lstm with trust gates for 3d human action recognition. In Leibe, B., Matas, J., Sebe, N., Welling, M., eds.: Computer Vision – ECCV 2016, Cham, Springer International Publishing (2016) 816–833
- On geometric features for skeleton-based action recognition using multilayer lstm networks. In: Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on, IEEE (2017) 148–157
- Deep learning on lie groups for skeleton-based action recognition. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE computer Society (2017) 6099–6108
- Mining actionlet ensemble for action recognition with depth cameras. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE (2012) 1290–1297
- Interpretable 3d human action analysis with temporal convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. (2017) 20–28
- A new representation of skeleton sequences for 3d action recognition. arXiv preprint arXiv:1703.03492 (2017)
- Global context-aware attention lstm networks for 3d action recognition. In: CVPR. (2017)
- Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), IEEE (2017) 1012–1020
- Adaptive rnn tree for large-scale human action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017) 1444–1452
- Histogram of oriented principal components for cross-view action recognition. IEEE transactions on pattern analysis and machine intelligence 38(12) (2016) 2430–2443
- 3d action recognition from novel viewpoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2016) 1506–1515
- Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2013) 2834–2841
- Action recognition based on a bag of 3d points. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, IEEE (2010) 9–14
- Human daily action analysis with multi-view and color-depth data. In: Computer Vision–ECCV 2012. Workshops and Demonstrations, Springer (2012) 52–61
- Deep multimodal feature analysis for action recognition in rgb+ d videos. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)
- Multimodal multipart learning for action recognition in depth videos. IEEE transactions on pattern analysis and machine intelligence 38(10) (2016) 2123–2129
- Jointly learning heterogeneous features for rgb-d activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. (2015) 5344–5352
- Structure-preserving binary representations for rgb-d action recognition. IEEE transactions on pattern analysis and machine intelligence 38(8) (2016) 1651–1664
- Two-stream rnn/cnn for action recognition in 3d videos. Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on (2017)
- Learning action recognition model from depth and skeleton videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017) 5832–5841
- Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems. (2014) 568–576
- Real-time action recognition with enhanced motion vector cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2016) 2718–2726
- Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2016) 1933–1941
- Action recognition with dynamic image networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)
- A primal-dual framework for real-time dense rgb-d scene flow. In: Robotics and Automation (ICRA), 2015 IEEE International Conference on, IEEE (2015) 98–104
- Deep affordance-grounded sensorimotor object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017)
- Learning human activities and object affordances from rgb-d videos. The International Journal of Robotics Research 32(8) (2013) 951–970
- Scene flow to action map: A new representation for rgb-d based action recognition with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017)
- Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision. (2015) 4489–4497
- Multimodal deep learning for robust rgb-d object recognition. In: Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on, IEEE (2015) 681–687
- On the integration of optical flow and action recognition. arXiv preprint arXiv:1712.08416 (2017)
- Accurate optical flow in noisy image sequences. In: Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on. Volume 1., IEEE (2001) 587–592
- Sequence of the most informative joints (smij): A new representation for human skeletal action recognition. Journal of Visual Communication and Image Representation 25(1) (2014) 24–38
- Discovering discriminative action parts from mid-level video representations. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE (2012) 1242–1249
- Action-attending graphic neural network. arXiv preprint arXiv:1711.06427 (2017)
- Rensink, R.A.: The dynamic representation of scenes. Visual cognition 7(1-3) (2000) 17–42
- Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks. In: Proceedings of the IEEE International Conference on Computer Vision. (2015) 2956–2964
- Videolstm convolves, attends and flows for action recognition. Computer Vision and Image Understanding 166 (2018) 41–50
- An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: AAAI. (2017) 4263–4270
- Spatial transformer networks. In: Advances in Neural Information Processing Systems. (2015) 2017–2025
- Residual attention network for image classification. (2017)
- Learning where to attend with deep architectures for image tracking. Neural computation 24(8) (2012) 2151–2184
- Attention is all you need. In Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., eds.: Advances in Neural Information Processing Systems 30. Curran Associates, Inc. (2017) 6000–6010
- Body joint guided 3-d deep convolutional descriptors for action recognition. IEEE transactions on cybernetics 48(3) (2018) 1095–1108
- Temporal convolutional networks for action segmentation and detection. arXiv preprint arXiv:1611.05267 (2016)
- Graph distillation for action detection with privileged information. arXiv preprint arXiv:1712.00108 (2017)
- Ntu rgb+d: A large scale dataset for 3d human activity analysis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (June 2016)
- Train, diagnose and fix: Interpretable approach for fine-grained action recognition. arXiv preprint arXiv:1711.08502 (2017)
- Cooperative training of deep aggregation networks for rgb-d action recognition. arXiv preprint arXiv:1801.01080 (2017)
- Unsupervised learning of long-term motion dynamics for videos. arXiv preprint arXiv:1701.01821 (2017)
- Gated siamese convolutional neural network architecture for human re-identification. In Leibe, B., Matas, J., Sebe, N., Welling, M., eds.: Computer Vision – ECCV 2016, Cham, Springer International Publishing (2016) 791–808
- Action recognition using visual attention. (2016)
- Human action recognition: Pose-based attention draws focus to hands. In: ICCV Workshop on Hands in Action. (2017)
- Identity mappings in deep residual networks. In: European Conference on Computer Vision, Springer (2016) 630–645
- Keras (2015)
- Tensorflow: A system for large-scale machine learning. In: OSDI. Volume 16. (2016) 265–283
- Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. (2010) 249–256
- Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. (2014) 1725–1732
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.