PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action Recognition (2405.06929v1)
Abstract: Recognizing human actions from point cloud sequence has attracted tremendous attention from both academia and industry due to its wide applications. However, most previous studies on point cloud action recognition typically require complex networks to extract intra-frame spatial features and inter-frame temporal features, resulting in an excessive number of redundant computations. This leads to high latency, rendering them impractical for real-world applications. To address this problem, we propose a Plane-Fit Redundancy Encoding point cloud sequence network named PRENet. The primary concept of our approach involves the utilization of plane fitting to mitigate spatial redundancy within the sequence, concurrently encoding the temporal redundancy of the entire sequence to minimize redundant computations. Specifically, our network comprises two principal modules: a Plane-Fit Embedding module and a Spatio-Temporal Consistency Encoding module. The Plane-Fit Embedding module capitalizes on the observation that successive point cloud frames exhibit unique geometric features in physical space, allowing for the reuse of spatially encoded data for temporal stream encoding. The Spatio-Temporal Consistency Encoding module amalgamates the temporal structure of the temporally redundant part with its corresponding spatial arrangement, thereby enhancing recognition accuracy. We have done numerous experiments to verify the effectiveness of our network. The experimental results demonstrate that our method achieves almost identical recognition accuracy while being nearly four times faster than other state-of-the-art methods.
- X. Liu, M. Yan, and J. Bohg, “Meteornet: Deep learning on dynamic 3d point cloud sequences,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9246–9255, 2019.
- H. Fan, X. Yu, Y. Ding, Y. Yang, and M. S. Kankanhalli, “Pstnet: Point spatio-temporal convolution on point cloud sequences,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, OpenReview.net, 2021.
- X. Li, Q. Huang, Z. Wang, T. Yang, Z. Hou, and Z. Miao, “Real-time 3-d human action recognition based on hyperpoint sequence,” IEEE Transactions on Industrial Informatics, vol. 19, pp. 8933–8942, 2021.
- H. Fan, Y. Yang, and M. Kankanhalli, “Point 4d transformer networks for spatio-temporal modeling in point cloud videos,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14204–14213, 2021.
- Y. Wang, Y. Xiao, F. Xiong, W. Jiang, Z. Cao, J. T. Zhou, and J. Yuan, “3dv: 3d dynamic voxel for action recognition in depth video,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 511–520, 2020.
- C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” Advances in neural information processing systems, vol. 30, 2017.
- A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “Ntu rgb+ d: A large scale dataset for 3d human activity analysis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1010–1019, 2016.
- J. Liu, A. Shahroudy, M. Perez, G. Wang, L.-Y. Duan, and A. C. Kot, “Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 10, pp. 2684–2701, 2019.
- W. Li, Z. Zhang, and Z. Liu, “Action recognition based on a bag of 3d points,” in 2010 IEEE computer society conference on computer vision and pattern recognition-workshops, pp. 9–14, IEEE, 2010.
- C. R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, and L. J. Guibas, “Volumetric and multi-view cnns for object classification on 3d data,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5648–5656, 2016.
- H. Hu, F. Wang, J. Su, H. Zhou, Y. Wang, L. Hu, Y. Zhang, and Z. Zhang, “GAM : Gradient attention module of optimization for point clouds analysis,” CoRR, vol. abs/2303.10543, 2023.
- Y. Wang, J. Yin, W. Li, P. Frossard, R. Yang, and J. Shen, “Ssda3d: Semi-supervised domain adaptation for 3d object detection from point cloud,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 2707–2715, 2023.
- Y. Liu, B. Tian, Y. Lv, L. Li, and F.-Y. Wang, “Point cloud classification using content-based transformer via clustering in feature space,” IEEE/CAA Journal of Automatica Sinica, 2023.
- Z. Wang and F. Lu, “Voxsegnet: Volumetric cnns for semantic part segmentation of 3d shapes,” IEEE transactions on visualization and computer graphics, vol. 26, no. 9, pp. 2919–2930, 2019.
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660, 2017.
- K. Tang, J. Wu, W. Peng, Y. Shi, P. Song, Z. Gu, Z. Tian, and W. Wang, “Deep manifold attack on point clouds via parameter plane stretching,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 2420–2428, 2023.
- Y. Lu, Q. Jiang, R. Chen, Y. Hou, X. Zhu, and Y. Ma, “See more and know more: Zero-shot point cloud segmentation via multi-modal visual data,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21674–21684, 2023.
- Z. Xu, Y. Liang, H. Lu, W. Kong, and G. Wu, “An approach for monitoring prefabricated building construction based on feature extraction and point cloud segmentation,” Engineering, Construction and Architectural Management, vol. 30, no. 10, pp. 5302–5332, 2023.
- Z. Li, J. Wang, X. Qu, and J. Xiao, “3d point cloud segmentation for complex structure based on pointsift,” in Pattern Recognition and Computer Vision: Third Chinese Conference, PRCV 2020, Nanjing, China, October 16–18, 2020, Proceedings, Part I 3, pp. 660–670, Springer, 2020.
- I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese, “3d semantic parsing of large-scale indoor spaces,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1534–1543, 2016.
- F. Liu, S. Li, L. Zhang, C. Zhou, R. Ye, Y. Wang, and J. Lu, “3dcnn-dqn-rnn: A deep reinforcement learning framework for semantic parsing of large-scale 3d point clouds,” in Proceedings of the IEEE international conference on computer vision, pp. 5678–5687, 2017.
- J. Xu, X. Li, Y. Tang, Q. Yu, Y. Hao, L. Hu, and M. Chen, “Casfusionnet: A cascaded network for point cloud semantic scene completion by dense feature fusion,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 3018–3026, 2023.
- S. Yang, M. Hou, and S. Li, “Three-dimensional point cloud semantic segmentation for cultural heritage: A comprehensive review,” Remote Sensing, vol. 15, no. 3, p. 548, 2023.
- J. Chen, J. Li, X. Qu, J. Wang, J. Wan, and J. Xiao, “Gaia: Delving into gradient-based attribution abnormality for out-of-distribution detection,” Advances in Neural Information Processing Systems, vol. 36, 2024.
- J. Chen, X. Qu, J. Li, J. Wang, J. Wan, and J. Xiao, “Detecting out-of-distribution examples via class-conditional impressions reappearing,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, IEEE, 2023.
- D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural network for real-time object recognition,” in 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 922–928, IEEE, 2015.
- Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shapes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912–1920, 2015.
- X. Wei, R. Yu, and J. Sun, “View-gcn: View-based graph convolutional network for 3d shape analysis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1850–1859, 2020.
- R. Zhang, Z. Guo, W. Zhang, K. Li, X. Miao, B. Cui, Y. Qiao, P. Gao, and H. Li, “Pointclip: Point cloud understanding by clip,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8552–8562, 2022.
- X. Zhu, R. Zhang, B. He, Z. Guo, Z. Zeng, Z. Qin, S. Zhang, and P. Gao, “Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2639–2650, 2023.
- T. Huang, B. Dong, Y. Yang, X. Huang, R. W. Lau, W. Ouyang, and W. Zuo, “Clip2point: Transfer clip to point cloud classification with image-depth pre-training,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22157–22167, 2023.
- Y. Wei, H. Liu, T. Xie, Q. Ke, and Y. Guo, “Spatial-temporal transformer for 3d point cloud sequences,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1171–1180, 2022.
- H. Fan, X. Yu, Y. Yang, and M. Kankanhalli, “Deep hierarchical representation of point cloud videos via spatio-temporal decomposition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, pp. 9918–9930, 2021.
- H. Fan, Y. Yang, and M. Kankanhalli, “Point spatio-temporal transformer networks for point cloud video modeling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 2181–2192, 2022.
- J.-X. Zhong, K. Zhou, Q. Hu, B. Wang, N. Trigoni, and A. Markham, “No pain, big gain: classify dynamic point cloud sequences with static models by fitting feature-level space-time surfaces,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8510–8520, 2022.
- G. Wang, H. Liu, M. Chen, Y. Yang, Z. Liu, and H. Wang, “Anchor-based spatio-temporal attention 3-d convolutional networks for dynamic 3-d point cloud sequences,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–11, 2021.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
- K. Cho, B. van Merrienboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL (A. Moschitti, B. Pang, and W. Daelemans, eds.), pp. 1724–1734, ACL, 2014.