Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

3D Learnable Supertoken Transformer for LiDAR Point Cloud Scene Segmentation (2405.15826v1)

Published 23 May 2024 in cs.CV

Abstract: 3D Transformers have achieved great success in point cloud understanding and representation. However, there is still considerable scope for further development in effective and efficient Transformers for large-scale LiDAR point cloud scene segmentation. This paper proposes a novel 3D Transformer framework, named 3D Learnable Supertoken Transformer (3DLST). The key contributions are summarized as follows. Firstly, we introduce the first Dynamic Supertoken Optimization (DSO) block for efficient token clustering and aggregating, where the learnable supertoken definition avoids the time-consuming pre-processing of traditional superpoint generation. Since the learnable supertokens can be dynamically optimized by multi-level deep features during network learning, they are tailored to the semantic homogeneity-aware token clustering. Secondly, an efficient Cross-Attention-guided Upsampling (CAU) block is proposed for token reconstruction from optimized supertokens. Thirdly, the 3DLST is equipped with a novel W-net architecture instead of the common U-net design, which is more suitable for Transformer-based feature learning. The SOTA performance on three challenging LiDAR datasets (airborne MultiSpectral LiDAR (MS-LiDAR) (89.3% of the average F1 score), DALES (80.2% of mIoU), and Toronto-3D dataset (80.4% of mIoU)) demonstrate the superiority of 3DLST and its strong adaptability to various LiDAR point cloud data (airborne MS-LiDAR, aerial LiDAR, and vehicle-mounted LiDAR data). Furthermore, 3DLST also achieves satisfactory results in terms of algorithm efficiency, which is up to 5x faster than previous best-performing methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. ConvPoint: Continuous convolutions for point cloud processing. Computers & Graphics 88, 24–34.
  2. Adaptive coarse-to-fine clustering and terrain feature-aware-based method for reducing liDAR terrain point clouds. ISPRS J. Photogramm. Remote Sens. 200, 89–105.
  3. TransRVNet: LiDAR semantic segmentation with transformer. IEEE Trans. Intell. Transp. Syst. 24, 5895–5907.
  4. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, in: Proc. Int. Conf. Learn. Represent., pp. 1–21. URL: https://openreview.net/forum?id=YicbFdNTTy.
  5. PCT: Point cloud transformer. Comput. Vis. Media. 7, 187–199.
  6. A survey of machine learning and deep learning in remote sensing of geological environment: Challenges, advances, and opportunities. ISPRS J. Photogramm. Remote Sens. 202, 87–113.
  7. WHU-Urban3D: An urban scene lidar point cloud dataset for semantic instance segmentation. ISPRS J. Photogramm. Remote Sens. 209, 500–513.
  8. Pyramid Point Cloud Transformer for Large-Scale Place Recognition, in: Proc. IEEE Int. Conf. Comput. Vis., pp. 6098–6107.
  9. Categorical Reparameterization with Gumbel-Softmax, in: Proc. Int. Conf. Learn. Represent., pp. 1–12. URL: https://openreview.net/forum?id=rkE3y85ee.
  10. Multispectral LiDAR point cloud classification using SE-PointNet++. Remote Sens. 13, 2516. doi:10.3390/RS13132516.
  11. Stratified Transformer for 3D Point Cloud Segmentation, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 8500–8509.
  12. Cut pursuit: Fast algorithms to learn piecewise constant functions on general weighted graphs. SIAM J. Imaging Sci. 10, 1724–1766.
  13. Large-scale point cloud semantic segmentation with superpoint graphs, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 4558–4567.
  14. DeepGCNs: Making GCNs go as deep as CNNs. IEEE Trans. Pattern Anal. Mach. Intell. 45, 6923–6939. doi:10.1109/TPAMI.2021.3074057.
  15. Point2Roof: End-to-end 3D building roof modeling from airborne LiDAR point clouds. ISPRS J. Photogramm. Remote Sens. 193, 17–28.
  16. An efficient point cloud place recognition approach based on transformer in dynamic environment. ISPRS J. Photogramm. Remote Sens. 207, 14–26.
  17. PointCNN: Convolution On X-Transformed Points, in: Proc. Adv. Neural Inf. Process. Syst., pp. 828–838.
  18. TGNet: Geometric graph CNN on 3-D point cloud segmentation. IEEE Trans. Geosci. Remote Sens. 58, 3588–3600.
  19. Semantic segmentation of bridge components and road infrastructure from mobile LiDAR data. ISPRS J. Photogramm. Remote Sens. 6, 100023.
  20. Relation-shape convolutional neural network for point cloud analysis, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 8895–8904.
  21. Swin transformer: Hierarchical vision transformer using shifted windows, in: Proc. IEEE Int. Conf. Comput. Vis., pp. 9992–10002.
  22. Point-Voxel CNN for Efficient 3D Deep Learning, in: Proc. Adv. Neural Inf. Process. Syst., pp. 963–973.
  23. FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1200–1211.
  24. 3DGTN: 3D Dual-Attention GLocal Transformer Network for Point Cloud Classification and Segmentation. IEEE Trans. Geosci. Remote Sens. .
  25. 3DCTN: 3D convolution-transformer network for point cloud classification. IEEE Trans. Intell. Transp. Syst. 23, 24854–24865.
  26. Dynamic clustering transformer network for point cloud segmentation. Int. J. Appl. Earth. Obs. 128, 103791.
  27. Multi-scale point-wise convolutional neural networks for 3D object segmentation from LiDAR point clouds in large-scale environments. IEEE Trans. Intell. Transport. Syst. 22, 821–836.
  28. Efficient Transformers with Dynamic Token Pooling, in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pp. 6403–6417. doi:10.18653/V1/2023.ACL-LONG.353.
  29. Fast point transformer, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 16949–16958.
  30. PointNet: Deep learning on point sets for 3D classification and segmentation, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 652–660.
  31. PointNet++: Deep hierarchical feature learning on point sets in a metric space, in: Proc. Adv. Neural Inf. Process. Syst., pp. 5099–5108.
  32. Geometric back-projection network for point cloud classification. IEEE Trans Multimedia 24, 1943–1955.
  33. Efficient 3D Semantic Segmentation with Superpoint Transformer, in: Proc. IEEE Int. Conf. Comput. Vis., pp. 17149–17158.
  34. Scalable 3D Panoptic Segmentation With Superpoint Graph Clustering. arXiv preprint arXiv:2401.06704 .
  35. PLANES4LOD2: Reconstruction of LoD-2 building models using a depth attention-based fully convolutional neural network. ISPRS J. Photogramm. Remote Sens. 211, 425–437.
  36. Change detection of urban objects using 3D point clouds: A review. ISPRS J. Photogramm. Remote Sens. 197, 228–255.
  37. Superpoint transformer for 3D scene instance segmentation, in: AAAI Conf. Artif. Intell., pp. 2393–2401.
  38. Toronto-3D: A large-scale mobile LiDAR dataset for semantic segmentation of urban roadways, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, pp. 202–203.
  39. KPConv: Flexible and Deformable Convolution for Point Clouds, in: Proc. IEEE Int. Conf. Comput. Vis., pp. 6411–6420.
  40. DALES: A large-scale aerial LiDAR data set for semantic segmentation, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, pp. 186–187.
  41. Dsvt: Dynamic sparse voxel transformer with rotated sets, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 13520–13529.
  42. Graph attention convolution for point cloud semantic segmentation, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 10296–10305.
  43. Extraction of urban building damage using spectral, height and corner information from VHR satellite images and airborne LiDAR data. ISPRS J. Photogramm. Remote Sens. 159, 322–336.
  44. Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38, 1–12. doi:10.1145/3326362.
  45. Imbalance knowledge-driven multi-modal network for land-cover semantic segmentation using aerial images and LiDAR point clouds. ISPRS J. Photogramm. Remote Sens. 202, 385–404.
  46. Centroid transformers: Learning to abstract with attention. arXiv preprint arXiv:2102.08606 .
  47. Multispectral LiDAR Point Cloud Segmentation for Land Cover Leveraging Semantic Fusion in Deep Learning Network. Remote Sens. 15, 243.
  48. k-means Mask Transformer, in: Proc. Eur. Conf. Comput. Vis., Springer. pp. 288–307.
  49. Patchformer: An efficient point transformer with patch attention, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 11799–11808.
  50. Improving graph representation for point cloud segmentation via attentive filtering, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1244–1254.
  51. Semantic Segmentation of Spectral Lidar Point Clouds Based on Neural Architecture Search. IEEE Trans. Geosci. Remote Sens. , 1–11doi:10.1109/TGRS.2023.3284995.
  52. Shellnet: Efficient point cloud convolutional neural networks using concentric shells statistics, in: Proc. IEEE Int. Conf. Comput. Vis., pp. 1607–1616.
  53. Point transformer, in: Proc. IEEE Int. Conf. Comput. Vis., pp. 16259–16268.
  54. Airborne multispectral LiDAR point cloud classification with a feature Reasoning-based graph convolution network. Int J Appl Earth Obs Geoinf. 105, 102634.
  55. Adaptive graph convolution for point cloud analysis, in: Proc. IEEE Int. Conf. Comput. Vis., pp. 4965–4974.

Summary

We haven't generated a summary for this paper yet.