Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Region-Enhanced Feature Learning for Scene Semantic Segmentation (2304.07486v3)

Published 15 Apr 2023 in cs.CV

Abstract: Semantic segmentation in complex scenes relies not only on object appearance but also on object location and the surrounding environment. Nonetheless, it is difficult to model long-range context in the format of pairwise point correlations due to the huge computational cost for large-scale point clouds. In this paper, we propose using regions as the intermediate representation of point clouds instead of fine-grained points or voxels to reduce the computational burden. We introduce a novel Region-Enhanced Feature Learning Network (REFL-Net) that leverages region correlations to enhance point feature learning. We design a region-based feature enhancement (RFE) module, which consists of a Semantic-Spatial Region Extraction stage and a Region Dependency Modeling stage. In the first stage, the input points are grouped into a set of regions based on their semantic and spatial proximity. In the second stage, we explore inter-region semantic and spatial relationships by employing a self-attention block on region features and then fuse point features with the region features to obtain more discriminative representations. Our proposed RFE module is plug-and-play and can be integrated with common semantic segmentation backbones. We conduct extensive experiments on ScanNetV2 and S3DIS datasets and evaluate our RFE module with different segmentation backbones. Our REFL-Net achieves 1.8% mIoU gain on ScanNetV2 and 1.7% mIoU gain on S3DIS with negligible computational cost compared with backbone models. Both quantitative and qualitative results show the powerful long-range context modeling ability and strong generalization ability of our REFL-Net.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” Advances in neural information processing systems, vol. 30, pp. 5099–5108, 2017.
  2. P.-S. Wang, Y. Liu, Y.-X. Guo, C.-Y. Sun, and X. Tong, “O-cnn: Octree-based convolutional neural networks for 3d shape analysis,” ACM Transactions On Graphics (TOG), vol. 36, no. 4, pp. 1–11, 2017.
  3. B. Graham, M. Engelcke, and L. Van Der Maaten, “3d semantic segmentation with submanifold sparse convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9224–9232.
  4. Y. Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen, “Pointcnn: Convolution on x-transformed points,” Advances in neural information processing systems, vol. 31, pp. 820–830, 2018.
  5. W. Wu, Z. Qi, and L. Fuxin, “Pointconv: Deep convolutional networks on 3d point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9621–9630.
  6. C. Choy, J. Gwak, and S. Savarese, “4d spatio-temporal convnets: Minkowski convolutional neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3075–3084.
  7. X. Yan, C. Zheng, Z. Li, S. Wang, and S. Cui, “Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5589–5598.
  8. G. Qian, Y. Li, H. Peng, J. Mai, H. Hammoud, M. Elhoseiny, and B. Ghanem, “Pointnext: Revisiting pointnet++ with improved training and scaling strategies,” Advances in Neural Information Processing Systems, vol. 35, pp. 23 192–23 204, 2022.
  9. N. Engel, V. Belagiannis, and K. Dietmayer, “Point transformer,” IEEE Access, vol. 9, pp. 134 826–134 840, 2021.
  10. M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, and S.-M. Hu, “Pct: Point cloud transformer,” Computational Visual Media, vol. 7, no. 2, pp. 187–199, 2021.
  11. H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V. Koltun, “Point transformer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 16 259–16 268.
  12. X. Lai, J. Liu, L. Jiang, L. Wang, H. Zhao, S. Liu, X. Qi, and J. Jia, “Stratified transformer for 3d point cloud segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8500–8509.
  13. J. Papon, A. Abramov, M. Schoeler, and F. Worgotter, “Voxel cloud connectivity segmentation-supervoxels for point clouds,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 2027–2034.
  14. L. Hui, J. Yuan, M. Cheng, J. Xie, X. Zhang, and J. Yang, “Superpoint network for point cloud oversegmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5510–5519.
  15. A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5828–5839.
  16. I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese, “3d semantic parsing of large-scale indoor spaces,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1534–1543.
  17. B. Graham, “Sparse 3d convolutional neural networks,” in Proceedings of the British Machine Vision Conference, 2015, pp. 150.1–150.9.
  18. B. Graham and L. van der Maaten, “Submanifold sparse convolutional networks,” arXiv preprint arXiv:1706.01307, 2017.
  19. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.
  20. H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, “Kpconv: Flexible and deformable convolution for point clouds,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6411–6420.
  21. Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,” Acm Transactions On Graphics (tog), vol. 38, no. 5, pp. 1–12, 2019.
  22. Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, and A. Markham, “Randla-net: Efficient semantic segmentation of large-scale point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 108–11 117.
  23. F. Engelmann, T. Kontogianni, and B. Leibe, “Dilated point convolutions: On the receptive field size of point convolutions on 3d point clouds,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 9463–9469.
  24. Z. Hu, M. Zhen, X. Bai, H. Fu, and C.-l. Tai, “Jsenet: Joint semantic segmentation and edge detection network for 3d point clouds,” in European Conference on Computer Vision, 2020, pp. 222–239.
  25. H. Lei, N. Akhtar, and A. Mian, “Seggcn: Efficient 3d point cloud segmentation with fuzzy spherical kernel,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 611–11 620.
  26. S. Fan, Q. Dong, F. Zhu, Y. Lv, P. Ye, and F.-Y. Wang, “Scf-net: Learning spatial contextual features for large-scale point cloud segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 504–14 513.
  27. M. Xu, R. Ding, H. Zhao, and X. Qi, “Paconv: Position adaptive convolution with dynamic kernel assembling on point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3173–3182.
  28. L. Tang, Y. Zhan, Z. Chen, B. Yu, and D. Tao, “Contrastive boundary learning for point cloud segmentation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8489–8499.
  29. H. Ran, J. Liu, and C. Wang, “Surface representation for point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18 942–18 952.
  30. X. Ma, C. Qin, H. You, H. Ran, and Y. Fu, “Rethinking network design and local geometry in point cloud: A simple residual MLP framework,” in International Conference on Learning Representations, 2022.
  31. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
  32. X. Yue, S. Sun, Z. Kuang, M. Wei, P. H. Torr, W. Zhang, and D. Lin, “Vision transformer with progressive sampling,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 387–396.
  33. W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 568–578.
  34. C. Chen, S. Qian, Q. Fang, and C. Xu, “Hapgn: Hierarchical attentive pooling graph network for point cloud segmentation,” IEEE Transactions on Multimedia, vol. 23, pp. 2335–2346, 2021.
  35. Y. Lin, C. Wang, D. Zhai, W. Li, and J. Li, “Toward better boundary preserved supervoxel segmentation for 3d point clouds,” ISPRS journal of photogrammetry and remote sensing, vol. 143, pp. 39–47, 2018.
  36. L. Landrieu and M. Simonovsky, “Large-scale point cloud semantic segmentation with superpoint graphs,” in IEEE conference on computer vision and pattern recognition, 2018, pp. 4558–4567.
  37. L. Landrieu and M. Boussaha, “Point cloud oversegmentation with graph-structured deep metric learning,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7440–7449.
  38. R. B. Rusu, N. Blodow, and M. Beetz, “Fast point feature histograms (fpfh) for 3d registration,” in 2009 IEEE international conference on robotics and automation.   IEEE, 2009, pp. 3212–3217.
  39. S. Guinard and L. Landrieu, “Weakly supervised segmentation-aided classification of urban scenes from 3d lidar point clouds,” in ISPRS Workshop 2017, 2017, pp. 151–157.
  40. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 012–10 022.
  41. K. Wu, H. Peng, M. Chen, J. Fu, and H. Chao, “Rethinking and improving relative position encoding for vision transformer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 033–10 041.
  42. C. Zhang, H. Wan, X. Shen, and Z. Wu, “Pvt: Point-voxel transformer for point cloud learning,” International Journal of Intelligent Systems, vol. 37, no. 12, pp. 11 985–12 008, 2022.
  43. J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9297–9307.
Citations (2)

Summary

We haven't generated a summary for this paper yet.