Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Soft Masked Transformer for Point Cloud Processing with Skip Attention-Based Upsampling (2403.14124v1)

Published 21 Mar 2024 in cs.CV

Abstract: Point cloud processing methods leverage local and global point features %at the feature level to cater to downstream tasks, yet they often overlook the task-level context inherent in point clouds during the encoding stage. We argue that integrating task-level information into the encoding stage significantly enhances performance. To that end, we propose SMTransformer which incorporates task-level information into a vector-based transformer by utilizing a soft mask generated from task-level queries and keys to learn the attention weights. Additionally, to facilitate effective communication between features from the encoding and decoding layers in high-level tasks such as segmentation, we introduce a skip-attention-based up-sampling block. This block dynamically fuses features from various resolution points across the encoding and decoding layers. To mitigate the increase in network parameters and training time resulting from the complexity of the aforementioned blocks, we propose a novel shared position encoding strategy. This strategy allows various transformer blocks to share the same position information over the same resolution points, thereby reducing network parameters and training time without compromising accuracy.Experimental comparisons with existing methods on multiple datasets demonstrate the efficacy of SMTransformer and skip-attention-based up-sampling for point cloud processing tasks, including semantic segmentation and classification. In particular, we achieve state-of-the-art semantic segmentation results of 73.4% mIoU on S3DIS Area 5 and 62.4% mIoU on SWAN dataset

Definition Search Book Streamline Icon: https://streamlinehq.com
References (81)
  1. Z. Ma, Z. Zheng, J. Wei, Y. Yang, and H. T. Shen, “Instance-dictionary learning for open-world object detection in autonomous driving scenarios,” IEEE Trans. Circuits Syst. Video Technol., 2023.
  2. D. W. Shu and J. Kwon, “Hierarchical bidirected graph convolutions for large-scale 3-d point cloud place recognition,” IEEE Trans. Neural Netw. Learn. Syst., 2023.
  3. Z. Wang, W. Li, and D. Xu, “Domain adaptive sampling for cross-domain point cloud recognition,” IEEE Trans.Circuits Syst. Video Technol., 2023.
  4. Y. Ren, Y. Cong, J. Dong, and G. Sun, “Uni3da: Universal 3d domain adaptation for object recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 1, pp. 379–392, 2022.
  5. F. J. Lawin, M. Danelljan, P. Tosteberg, G. Bhat, F. S. Khan, and M. Felsberg, “Deep projective 3d semantic segmentation,” in Proc. Int. Conf. Pattern Recognit. Image Anal.   Springer, 2017, pp. 95–107.
  6. A. Boulch, J. Guerry, B. Le Saux, and N. Audebert, “Snapnet: 3d point cloud semantic labeling with 2d deep segmentation networks,” Comput. Graph., vol. 71, pp. 189–198, 2018.
  7. L. Tchapmi, C. Choy, I. Armeni, J. Gwak, and S. Savarese, “Segcloud: Semantic segmentation of 3d point clouds,” in Proc. Int. COnf. 3D Vis.   IEEE, 2017, pp. 537–547.
  8. D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural network for real-time object recognition,” in Proc. IEEE Int. Conf. Intell. Rob. Syst.   IEEE, 2015, pp. 922–928.
  9. Y. Shen, C. Feng, Y. Yang, and D. Tian, “Mining point cloud local structures by kernel correlation and graph pooling,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 4548–4557.
  10. H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, “Kpconv: Flexible and deformable convolution for point clouds,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 6411–6420.
  11. M. Fey, J. E. Lenssen, F. Weichert, and H. Müller, “Splinecnn: Fast geometric deep learning with continuous b-spline kernels,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 869–877.
  12. Y. Xu, T. Fan, M. Xu, L. Zeng, and Y. Qiao, “Spidercnn: Deep learning on point sets with parameterized convolutional filters,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 87–102.
  13. W. Wu, Z. Qi, and L. Fuxin, “Pointconv: Deep convolutional networks on 3d point clouds,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9621–9630.
  14. P. Hermosilla, T. Ritschel, P.-P. Vázquez, À. Vinacua, and T. Ropinski, “Monte carlo convolution for learning on non-uniformly sampled point clouds,” ACM Trans. Graph., vol. 37, no. 6, pp. 1–12, 2018.
  15. H. Lei, N. Akhtar, and A. Mian, “Seggcn: Efficient 3d point cloud segmentation with fuzzy spherical kernel,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., June 2020.
  16. S. Xie, S. Liu, Z. Chen, and Z. Tu, “Attentional shapecontextnet for point cloud recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 4606–4615.
  17. X. Liu, Z. Han, Y.-S. Liu, and M. Zwicker, “Point2sequence: Learning the shape representation of 3d point clouds with an attention-based sequence to sequence network,” in Proc. AAAI Conf. Artif. Intell., vol. 33, no. 01, 2019, pp. 8778–8785.
  18. J. Yang, Q. Zhang, B. Ni, L. Li, J. Liu, M. Zhou, and Q. Tian, “Modeling point clouds with self-attention and gumbel subset sampling,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3323–3332.
  19. J. Lee, Y. Lee, J. Kim, A. Kosiorek, S. Choi, and Y. W. Teh, “Set transformer: A framework for attention-based permutation-invariant neural networks,” in Proc. Int. Conf. Mach. Learn.   PMLR, 2019, pp. 3744–3753.
  20. M. Feng, L. Zhang, X. Lin, S. Z. Gilani, and A. Mian, “Point attention network for semantic segmentation of 3d point clouds,” Pattern Recognit., vol. 107, p. 107446, 2020.
  21. X. Yan, C. Zheng, Z. Li, S. Wang, and S. Cui, “Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 5589–5598.
  22. X. Wu, Y. Lao, L. Jiang, X. Liu, and H. Zhao, “Point transformer v2: Grouped vector attention and partition-based pooling,” Proc. Adv. Neural Inf. Process. Syst., vol. 35, pp. 33 330–33 342, 2022.
  23. H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V. Koltun, “Point transformer,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 16 259–16 268.
  24. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 652–660.
  25. C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” arXiv preprint arXiv:1706.02413, 2017.
  26. J. Li, B. M. Chen, and G. H. Lee, “So-net: Self-organizing network for point cloud analysis,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 9397–9406.
  27. H. Ran, J. Liu, and C. Wang, “Surface representation for point clouds,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022.
  28. X. Ma, C. Qin, H. You, H. Ran, and Y. Fu, “Rethinking network design and local geometry in point cloud: A simple residual mlp framework,” in Proc. Int. Conf. Learn. Represent., 2021.
  29. X. Deng, W. Zhang, Q. Ding, and X. Zhang, “Pointvector: A vector representation in point cloud analysis,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 9455–9465.
  30. H. Zhao, L. Jiang, C.-W. Fu, and J. Jia, “Pointweb: Enhancing local neighborhood features for point cloud processing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5565–5573.
  31. L. Jiang, H. Zhao, S. Liu, X. Shen, C.-W. Fu, and J. Jia, “Hierarchical point-edge interaction network for point cloud semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 10 433–10 441.
  32. M. Xu, J. Zhang, Z. Zhou, M. Xu, X. Qi, and Y. Qiao, “Learning geometry-disentangled representation for complementary understanding of 3d object point cloud,” in Proc. AAAI Conf. Artif. Intell., vol. 35, 2021, pp. 3056–3064.
  33. Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,” ACM Trans. Graph., vol. 38, no. 5, pp. 1–12, 2019.
  34. R. Klokov and V. Lempitsky, “Escape from cells: Deep kd-networks for the recognition of 3d point cloud models,” in Proc. IEEE Int. Conf. Compu. Vis., 2017, pp. 863–872.
  35. M. Xu, Z. Zhou, and Y. Qiao, “Geometry sharing network for 3d point cloud classification and segmentation,” in Proc. AAAI Conf. Artif. Intell., vol. 34, 2020, pp. 12 500–12 507.
  36. Y. Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen, “Pointcnn: Convolution on x-transformed points,” Proc. Adv. Neural Inf. Process. Syst., vol. 31, pp. 820–830, 2018.
  37. M. Simonovsky and N. Komodakis, “Dynamic edge-conditioned filters in convolutional neural networks on graphs,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 3693–3702.
  38. S. Wang, S. Suo, W.-C. Ma, A. Pokrovsky, and R. Urtasun, “Deep parametric continuous convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 2589–2597.
  39. L. Wang, Y. Huang, Y. Hou, S. Zhang, and J. Shan, “Graph attention convolution for point cloud semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 10 296–10 305.
  40. C. Wang, B. Samari, and K. Siddiqi, “Local spectral graph convolution for point set feature learning,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 52–66.
  41. X. Lai, J. Liu, L. Jiang, L. Wang, H. Zhao, S. Liu, X. Qi, and J. Jia, “Stratified transformer for 3d point cloud segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 8500–8509.
  42. Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, and A. Markham, “Randla-net: Efficient semantic segmentation of large-scale point clouds,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11 108–11 117.
  43. F. Groh, P. Wieschollek, and H. P. Lensch, “Flex-convolution,” in Proc. Asian Conf. Comput. Vis.   Springer, 2018, pp. 105–122.
  44. F. Engelmann, T. Kontogianni, J. Schult, and B. Leibe, “Know what your neighbors do: 3d semantic segmentation of point clouds,” in Proc. Eur. Conf. Comput. Vis. Worksh., 2018, pp. 0–0.
  45. F. Engelmann, T. Kontogianni, A. Hermans, and B. Leibe, “Exploring spatial context for 3d semantic segmentation of point clouds,” in Proc. IEEE Int. Conf. Comput. Vis. Worksh., 2017, pp. 716–724.
  46. Z. Zhang, B.-S. Hua, and S.-K. Yeung, “Shellnet: Efficient point cloud convolutional neural networks using concentric shells statistics,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1607–1616.
  47. G. Qian, Y. Li, H. Peng, J. Mai, H. Hammoud, M. Elhoseiny, and B. Ghanem, “Pointnext: Revisiting pointnet++ with improved training and scaling strategies,” in Proc. Adv. Neural Inf. Process. Syst., 2022.
  48. H. Lei, N. Akhtar, and A. Mian, “Spherical kernel for efficient graph convolution on 3d point clouds,” IEEE Trans. Pattern Anal. Mach. Intell., 2020.
  49. M. Xu, R. Ding, H. Zhao, and X. Qi, “Paconv: Position adaptive convolution with dynamic kernel assembling on point clouds,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 3173–3182.
  50. S. Qiu, S. Anwar, and N. Barnes, “Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 1757–1767.
  51. L. Tang, Y. Zhan, Z. Chen, B. Yu, and D. Tao, “Contrastive boundary learning for point cloud segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 8489–8499.
  52. J. Choe, C. Park, F. Rameau, J. Park, and I. S. Kweon, “Pointmixer: Mlp-mixer for point cloud understanding,” in Proc. Eur. Conf. Comput. Vis.   Springer, 2022, pp. 620–640.
  53. Z. Huang, Z. Zhao, B. Li, and J. Han, “Lcpformer: Towards effective 3d point cloud analysis via local context propagation in transformers,” IEEE Trans. Circuits Syst. Video Technol., 2023.
  54. C. Chen, D. Liu, C. Xu, and T.-K. Truong, “Saks: Sampling adaptive kernels from subspace for point cloud graph convolution,” IEEE Trans. Circuits Syst. Video Technol., 2023.
  55. H. Lei, N. Akhtar, M. Shah, and A. Mian, “Mesh convolution with continuous filters for 3-d surface parsing,” IEEE Trans. Neural Netw. Learn. Syst., 2023.
  56. S. Fan, Q. Dong, F. Zhu, Y. Lv, P. Ye, and F.-Y. Wang, “Scf-net: Learning spatial contextual features for large-scale point cloud segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 14 504–14 513.
  57. B. Graham, M. Engelcke, and L. Van Der Maaten, “3d semantic segmentation with submanifold sparse convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 9224–9232.
  58. M. Jaritz, J. Gu, and H. Su, “Multi-view pointnet for 3d scene understanding,” in Proc. IEEE Int. Conf. Comput. Vis. Worksh., 2019, pp. 0–0.
  59. H.-Y. Chiang, Y.-L. Lin, Y.-C. Liu, and W. H. Hsu, “A unified point-based framework for 3d segmentation,” in Proc. Int. Conf. 3D Vis., 2019, pp. 155–163.
  60. C. Choy, J. Gwak, and S. Savarese, “4d spatio-temporal convnets: Minkowski convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3075–3084.
  61. C. Park, Y. Jeong, M. Cho, and J. Park, “Fast point transformer,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 16 949–16 958.
  62. I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese, “3d semantic parsing of large-scale indoor spaces,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 1534–1543.
  63. A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 5828–5839.
  64. B. Wu, A. Wan, X. Yue, and K. Keutzer, “Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud,” in Proc. IEEE Int. Conf. Robot. Autom.   IEEE, 2018, pp. 1887–1893.
  65. B. Wu, X. Zhou, S. Zhao, X. Yue, and K. Keutzer, “Squeezesegv2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud,” in Proc. IEEE Int. Conf. Robot. Autom.   IEEE, 2019, pp. 4376–4382.
  66. A. Milioto, I. Vizzo, J. Behley, and C. Stachniss, “Rangenet++: Fast and accurate lidar semantic segmentation,” in Proc. IEEE Int. Conf. Intell. Rob. Syst.   IEEE, 2019, pp. 4213–4220.
  67. Y. Zhang, Z. Zhou, P. David, X. Yue, Z. Xi, B. Gong, and H. Foroosh, “Polarnet: An improved grid representation for online lidar point clouds semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9601–9610.
  68. X. Zhu, H. Zhou, T. Wang, F. Hong, Y. Ma, W. Li, H. Li, and D. Lin, “Cylindrical and asymmetrical 3d convolution networks for lidar segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 9939–9948.
  69. R. Cheng, R. Razani, E. Taghavi, E. Li, and B. Liu, “2-s3net: Attentive feature fusion with adaptive feature selection for sparse semantic segmentation network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 12 547–12 556.
  70. Y. Hou, X. Zhu, Y. Ma, C. C. Loy, and Y. Li, “Point-to-voxel knowledge distillation for lidar semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 8479–8488.
  71. X. Yan, J. Gao, C. Zheng, C. Zheng, R. Zhang, S. Cui, and Z. Li, “2dpass: 2d priors assisted semantic segmentation on lidar point clouds,” in Proc. Eur. Conf. on Comput. Vis.   Springer, 2022, pp. 677–695.
  72. M. Ibrahim, N. Akhtar, S. Anwar, and A. Mian, “Sat3d: Slot attention transformer for 3d point cloud semantic segmentation,” IEEE Trans. Intell. Transp. Syst., 2023.
  73. L. Kong, Y. Liu, R. Chen, Y. Ma, X. Zhu, Y. Li, Y. Hou, Y. Qiao, and Z. Liu, “Rethinking range view representation for lidar segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 228–240.
  74. J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 9297–9307.
  75. H. Lei, N. Akhtar, and A. Mian, “Octree guided cnn with spherical kernels for 3d point clouds,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9631–9640.
  76. Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shapes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 1912–1920.
  77. M. A. Uy, Q.-H. Pham, B.-S. Hua, T. Nguyen, and S.-K. Yeung, “Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1588–1597.
  78. S. Cheng, X. Chen, X. He, Z. Liu, and X. Bai, “Pra-net: Point relation-aware network for 3d point cloud analysis,” IEEE Trans. Image Process., vol. 30, pp. 4436–4448, 2021.
  79. Y. Liu, B. Fan, S. Xiang, and C. Pan, “Relation-shape convolutional neural network for point cloud analysis,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 8895–8904.
  80. Y. He, H. Yu, Z. Yang, W. Sun, M. Feng, and A. Mian, “Danet: Density adaptive convolutional network with interactive attention for 3d point clouds,” IEEE Robot. Autom. Lett., 2023.
  81. Y. Ben-Shabat, M. Lindenbaum, and A. Fischer, “3dmfv: Three-dimensional point cloud classification in real-time using convolutional neural networks,” IEEE Robot. Autom. Lett., vol. 3, no. 4, pp. 3145–3152, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yong He (77 papers)
  2. Hongshan Yu (18 papers)
  3. Muhammad Ibrahim (16 papers)
  4. Xiaoyan Liu (22 papers)
  5. Tongjia Chen (5 papers)
  6. Anwaar Ulhaq (25 papers)
  7. Ajmal Mian (136 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com