Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dynamic 3D Point Cloud Sequences as 2D Videos (2403.01129v2)

Published 2 Mar 2024 in cs.CV

Abstract: Dynamic 3D point cloud sequences serve as one of the most common and practical representation modalities of dynamic real-world environments. However, their unstructured nature in both spatial and temporal domains poses significant challenges to effective and efficient processing. Existing deep point cloud sequence modeling approaches imitate the mature 2D video learning mechanisms by developing complex spatio-temporal point neighbor grouping and feature aggregation schemes, often resulting in methods lacking effectiveness, efficiency, and expressive power. In this paper, we propose a novel generic representation called \textit{Structured Point Cloud Videos} (SPCVs). Intuitively, by leveraging the fact that 3D geometric shapes are essentially 2D manifolds, SPCV re-organizes a point cloud sequence as a 2D video with spatial smoothness and temporal consistency, where the pixel values correspond to the 3D coordinates of points. The structured nature of our SPCV representation allows for the seamless adaptation of well-established 2D image/video techniques, enabling efficient and effective processing and analysis of 3D point cloud sequences. To achieve such re-organization, we design a self-supervised learning pipeline that is geometrically regularized and driven by self-reconstructive and deformation field learning objectives. Additionally, we construct SPCV-based frameworks for both low-level and high-level 3D point cloud sequence processing and analysis tasks, including action recognition, temporal interpolation, and compression. Extensive experiments demonstrate the versatility and superiority of the proposed SPCV, which has the potential to offer new possibilities for deep learning on unstructured 3D point cloud sequences. Code will be released at https://github.com/ZENGYIMING-EAMON/SPCV.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van Der Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in Int. Conf. Comput. Vis., 2015, pp. 2758–2766.
  2. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 770–778.
  3. E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” in IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 2462–2470.
  4. K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” Adv. Neural Inform. Process. Syst., vol. 27, 2014.
  5. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 652–660.
  6. C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” in Adv. Neural Inform. Process. Syst., 2017, pp. 5099–5108.
  7. Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,” ACM Trans. Graph., vol. 38, no. 5, pp. 1–12, 2019.
  8. H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V. Koltun, “Point transformer,” in Int. Conf. Comput. Vis., 2021, pp. 16 259–16 268.
  9. G. Qian, Y. Li, H. Peng, J. Mai, H. A. A. K. Hammoud, M. Elhoseiny, and B. Ghanem, “Pointnext: Revisiting pointnet++ with improved training and scaling strategies,” in Proc. NeurIPS, 2022.
  10. H. Lin, X. Zheng, L. Li, F. Chao, S. Wang, Y. Wang, Y. Tian, and R. Ji, “Meta architecture for point cloud analysis,” in Proc. CVPR, 2023, pp. 17 682–17 691.
  11. X. Liu, M. Yan, and J. Bohg, “Meteornet: Deep learning on dynamic 3d point cloud sequences,” in Int. Conf. Comput. Vis., 2019, pp. 9246–9255.
  12. H. Fan, Y. Yang, and M. Kankanhalli, “Point spatio-temporal transformer networks for point cloud video modeling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, pp. 2181–2192, 2022.
  13. H. Fan, H. Su, and L. J. Guibas, “A point set generation network for 3d object reconstruction from a single image,” in Proc. CVPR, 2017, pp. 605–613.
  14. J. Li, B. M. Chen, and G. H. Lee, “So-net: Self-organizing network for point cloud analysis,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 9397–9406.
  15. Y. Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen, “Pointcnn: Convolution on χ𝜒\chiitalic_χ-transformed points,” in Adv. Neural Inform. Process. Syst., 2018, pp. 828–838.
  16. H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, “Kpconv: Flexible and deformable convolution for point clouds,” in Int. Conf. Comput. Vis., 2019, pp. 6411–6420.
  17. Y. Liu, B. Fan, S. Xiang, and C. Pan, “Relation-shape convolutional neural network for point cloud analysis,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 8895–8904.
  18. X. Yan, C. Zheng, Z. Li, S. Wang, and S. Cui, “Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 5589–5598.
  19. M. Xu, R. Ding, H. Zhao, and X. Qi, “Paconv: Position adaptive convolution with dynamic kernel assembling on point clouds,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 3173–3182.
  20. T. Xiang, C. Zhang, Y. Song, J. Yu, and W. Cai, “Walk in the cloud: Learning curves for point clouds shape analysis,” in Int. Conf. Comput. Vis., 2021, pp. 915–924.
  21. N. Verma, E. Boyer, and J. Verbeek, “Feastnet: Feature-steered graph convolutions for 3d shape analysis,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 2598–2606.
  22. Q. Xu, X. Sun, C.-Y. Wu, P. Wang, and U. Neumann, “Grid-gcn for fast and scalable point cloud learning,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 5661–5670.
  23. M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, and S.-M. Hu, “Pct: Point cloud transformer,” Computational Visual Media, vol. 7, no. 2, pp. 187–199, 2021.
  24. C. Park, Y. Jeong, M. Cho, and J. Park, “Fast Point Transformer,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 16 949–16 958.
  25. X. Yu, L. Tang, Y. Rao, T. Huang, J. Zhou, and J. Lu, “Point-bert: Pre-training 3d point cloud transformers with masked point modeling,” in Proc. CVPR, 2022, pp. 19 313–19 322.
  26. X. Ma, C. Qin, H. You, H. Ran, and Y. Fu, “Rethinking network design and local geometry in point cloud: A simple residual MLP framework,” in Proc. ICLR, 2022.
  27. X. Gu, S. J. Gortler, and H. Hoppe, “Geometry images,” in Proceedings of the 29th annual conference on Computer graphics and interactive techniques, 2002, pp. 355–361.
  28. A. Sinha, J. Bai, and K. Ramani, “Deep learning 3d shape surfaces using geometry images,” in Eur. Conf. Comput. Vis., 2016, pp. 223–240.
  29. A. Sinha, A. Unmesh, Q. Huang, and K. Ramani, “Surfnet: Generating 3d shape surfaces using deep residual networks,” in IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 6040–6049.
  30. H. Maron, M. Galun, N. Aigerman, M. Trope, N. Dym, E. Yumer, V. G. Kim, and Y. Lipman, “Convolutional neural networks on surfaces via seamless toric covers,” ACM Trans. Graph., vol. 36, no. 4, pp. 71–1, 2017.
  31. N. Haim, N. Segol, H. Ben-Hamu, H. Maron, and Y. Lipman, “Surface networks via general covers,” in Int. Conf. Comput. Vis., 2019, pp. 632–641.
  32. Q. Zhang, J. Hou, Y. Qian, A. B. Chan, J. Zhang, and Y. He, “Reggeonet: Learning regular representations for large-scale 3d point clouds,” Int. J. Comput. Vis., vol. 130, no. 12, pp. 3100–3122, 2022.
  33. Q. Zhang, J. Hou, Y. Qian, Y. Zeng, J. Zhang, and Y. He, “Flattening-net: Deep regular 2d representation for 3d point cloud analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 8, pp. 9726–9742, 2023.
  34. Y. Yang, C. Feng, Y. Shen, and D. Tian, “Foldingnet: Point cloud auto-encoder via deep grid deformation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 206–215.
  35. T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and M. Aubry, “A papier-mâché approach to learning 3d surface generation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 216–224.
  36. J. Pang, D. Li, and D. Tian, “Tearingnet: Point cloud autoencoder to learn topology-friendly representations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7453–7462.
  37. H. Fan, X. Yu, Y. Ding, Y. Yang, and M. S. Kankanhalli, “Pstnet: Point spatio-temporal convolution on point cloud sequences,” in Int. Conf. Learn. Represent., 2021.
  38. H. Fan, X. Yu, Y. Yang, and M. Kankanhalli, “Deep hierarchical representation of point cloud videos via spatio-temporal decomposition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 12, pp. 9918–9930, 2021.
  39. Z. Shen, X. Sheng, H. Fan, L. Wang, Y. Guo, Q. Liu, H. Wen, and X. Zhou, “Masked spatio-temporal structure prediction for self-supervised learning on point cloud videos,” in Int. Conf. Comput. Vis., 2023, pp. 16 580–16 589.
  40. H. Fan, Y. Yang, and M. S. Kankanhalli, “Point 4d transformer networks for spatio-temporal modeling in point cloud videos,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 14 204–14 213.
  41. Y. Wei, H. Liu, T. Xie, Q. Ke, and Y. Guo, “Spatial-temporal transformer for 3d point cloud sequences,” in IEEE Winter Conf. on App. of Comput. Vis., 2022, pp. 1171–1180.
  42. F. Lu, G. Chen, S. Qu, Z. Li, Y. Liu, and A. Knoll, “Pointinet: Point cloud frame interpolation network,” in AAAI, 2021.
  43. Y. Zeng, Y. Qian, Q. Zhang, J. Hou, Y. Yuan, and Y. He, “Idea-net: Dynamic 3d point cloud interpolation via deep embedding alignment,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 6338–6347.
  44. Z. Zheng, D. Wu, R. Lu, F. Lu, G. Chen, and C. Jiang, “Neuralpci: Spatio-temporal neural field for 3d point cloud multi-frame non-linear interpolation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 909–918.
  45. C. Feichtenhofer, Y. Li, K. He et al., “Masked autoencoders as spatiotemporal learners,” Adv. Neural Inform. Process. Syst., vol. 35, pp. 35 946–35 958, 2022.
  46. M. Niemeyer, L. Mescheder, M. Oechsle, and A. Geiger, “Occupancy flow: 4d reconstruction by learning particle dynamics,” in Int. Conf. Comput. Vis., 2019, pp. 5379–5389.
  47. D. Rempe, T. Birdal, Y. Zhao, Z. Gojcic, S. Sridhar, and L. J. Guibas, “Caspr: Learning canonical spatiotemporal point cloud representations,” in Adv. Neural Inform. Process. Syst., 2020, pp. 13 688–13 701.
  48. J. Lei and K. Daniilidis, “Cadex: Learning canonical deformation coordinate space for dynamic surface representation via neural homeomorphism,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 6624–6634.
  49. Y. Wei, Z. Wang, Y. Rao, J. Lu, and J. Zhou, “Pv-raft: Point-voxel correlation fields for scene flow estimation of point clouds,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 6954–6963.
  50. X. Li, J. Kaesemodel Pontes, and S. Lucey, “Neural scene flow prior,” Adv. Neural Inform. Process. Syst., vol. 34, pp. 7838–7851, 2021.
  51. P. He, P. Emami, S. Ranka, and A. Rangarajan, “Learning scene dynamics from point cloud sequences,” Int. J. Comput. Vis., vol. 130, no. 3, pp. 669–695, 2022.
  52. Y. Wang, Y. Xiao, F. Xiong, W. Jiang, Z. Cao, J. T. Zhou, and J. Yuan, “3dv: 3d dynamic voxel for action recognition in depth video,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 511–520.
  53. Z. Li, T. Li, and A. B. Farimani, “Tpu-gan: Learning temporal coherence from dynamic point cloud sequences,” in Int. Conf. Learn. Represent., 2021.
  54. J.-X. Zhong, K. Zhou, Q. Hu, B. Wang, N. Trigoni, and A. Markham, “No pain, big gain: classify dynamic point cloud sequences with static models by fitting feature-level space-time surfaces,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 8510–8520.
  55. X. Sheng, Z. Shen, G. Xiao, L. Wang, Y. Guo, and H. Fan, “Point contrastive prediction with semantic clustering for self-supervised learning on point cloud videos,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 16 515–16 524.
  56. M. Eisenberger, D. Novotny, G. Kerchenbaum, P. Labatut, N. Neverova, D. Cremers, and A. Vedaldi, “Neuromorph: Unsupervised shape interpolation and correspondence in one go,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 7473–7483.
  57. Y. Zeng, Y. Qian, Z. Zhu, J. Hou, H. Yuan, and Y. He, “Corrnet3d: Unsupervised end-to-end learning of dense correspondence for 3d point clouds,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 6052–6061.
  58. M. Quach, G. Valenzise, and F. Dufaux, “Folding-based compression of point cloud attributes,” in IEEE Int. Conf. Image Process.   IEEE, 2020, pp. 3309–3313.
  59. D. Graziosi, O. Nakagami, S. Kuma, A. Zaghetto, T. Suzuki, and A. Tabatabai, “An overview of ongoing point cloud compression standardization activities: Video-based (v-pcc) and geometry-based (g-pcc),” APSIPA Transactions on Signal and Information Processing, vol. 9, p. e13, 2020.
  60. T. Nguyen, Q.-H. Pham, T. Le, T. Pham, N. Ho, and B.-S. Hua, “Point-set distances for learning representations of 3d point clouds,” in Int. Conf. Comput. Vis., 2021, pp. 10 478–10 487.
  61. A. Sheffer, E. Praun, and K. Rose, “Mesh parameterization methods and their applications,” Found. Trends Comput. Graph. Vis., vol. 2, no. 2, pp. 105–171, 2006.
  62. N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y. Bengio, and A. Courville, “On the spectral bias of neural networks,” in Int. Conf. on ML., 2019, pp. 5301–5310.
  63. S. Ren and J. Hou, “Unleash the potential of 3d point cloud modeling with a calibrated local geometry-driven distance metric,” arXiv preprint arXiv:2306.00552, 2023.
  64. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  65. H. M. Briceno, P. V. Sander, L. McMillan, S. Gortler, and H. Hoppe, “Geometry videos,” in Eurographics/SIGGRAPH symposium on computer animation (SCA), 2003.
  66. J. Xia, Y. He, D. P. Quynh, X. Chen, and S. C. Hoi, “Modeling 3d facial expressions using geometry videos,” in Proceedings of the 18th ACM international conference on Multimedia, 2010, pp. 591–600.
  67. D. T. Quynh, Y. He, X. Chen, J. Xia, Q. Sun, and S. C. Hoi, “Modeling 3d articulated motions with conformal geometry videos (cgvs),” in Proceedings of the 19th ACM international conference on Multimedia, 2011, pp. 383–392.
  68. H. Fan, X. Yu, Y. Ding, Y. Yang, and M. Kankanhalli, “Pstnet: Point spatio-temporal convolution on point cloud sequences,” in Int. Conf. Learn. Represent., 2021.
  69. D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, and M. Paluri, “A closer look at spatiotemporal convolutions for action recognition,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 6450–6459.
  70. T. Kalluri, D. Pathak, M. Chandraker, and D. Tran, “Flavr: Flow-agnostic video representations for fast frame interpolation,” in IEEE Winter Conf. on App. of Comput. Vis., 2023, pp. 2071–2082.
  71. S. Schwarz, M. Preda, V. Baroncini, M. Budagavi, P. Cesar, P. A. Chou, R. A. Cohen, M. Krivokuća, S. Lasserre, Z. Li et al., “Emerging mpeg standards for point cloud compression,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 1, pp. 133–148, 2018.
  72. H. Liu, H. Yuan, Q. Liu, J. Hou, and J. Liu, “A comprehensive study and comparison of core technologies for mpeg 3-d point cloud compression,” IEEE Transactions on Broadcasting, vol. 66, no. 3, pp. 701–717, 2019.
  73. M. Quach, J. Pang, D. Tian, G. Valenzise, and F. Dufaux, “Survey on deep learning-based point cloud compression,” Frontiers in Signal Processing, vol. 2, p. 846972, 2022.
  74. J. Hou, L.-P. Chau, N. Magnenat-Thalmann, and Y. He, “Compressing 3-d human motions via keyframe-based geometry videos,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, no. 1, pp. 51–62, 2014.
  75. J. Hou, L.-P. Chau, M. Zhang, N. Magnenat-Thalmann, and Y. He, “A highly efficient compression framework for time-varying 3-d facial expressions,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 9, pp. 1541–1553, 2014.
  76. International Telecommunication Union, “H.266: Versatile video coding,” 2021, archived from the original on 21 June 2021. Retrieved on 21 June 2021. [Online]. Available: https://www.itu.int/rec/T-REC-H.266
  77. L. Yu, X. Li, C.-W. Fu, D. Cohen-Or, and P.-A. Heng, “Pu-net: Point cloud upsampling network,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 2790–2799.
  78. D. Vlasic, I. Baran, W. Matusik, and J. Popovic, “Articulated mesh animation from multi-view silhouettes (data link),” https://people.csail.mit.edu/drdaniel/mesh_animation/#data, 2023, accessed: 2023-11-2.
  79. R. W. Sumner and J. Popović, “Deformation transfer for triangle meshes,” ACM Trans. Graph., vol. 23, no. 3, pp. 399–405, 2004.
  80. Y. Li, H. Takehara, T. Taketomi, B. Zheng, and M. Nießner, “4dcomplete: Non-rigid motion estimation beyond the observable surface,” in Int. Conf. Comput. Vis., 2021, pp. 12 706–12 716.
  81. Y. Xu, Y. Lu, and Z. Wen, “Owlii Dynamic human mesh sequence dataset,” ISO/IEC JTC1/SC29/WG11 m41658, 120th MPEG Meeting, Macau, Oct 2017.
  82. D. Vlasic, I. Baran, W. Matusik, and J. Popovic, “Articulated mesh animation from multi-view silhouettes,” in SIGGRAPH, 2008.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yiming Zeng (17 papers)
  2. Junhui Hou (138 papers)
  3. Qijian Zhang (20 papers)
  4. Siyu Ren (24 papers)
  5. Wenping Wang (184 papers)

Summary

We haven't generated a summary for this paper yet.