Papers
Topics
Authors
Recent
Search
2000 character limit reached

AVS-Net: Point Sampling with Adaptive Voxel Size for 3D Scene Understanding

Published 27 Feb 2024 in cs.CV | (2402.17521v3)

Abstract: The recent advancements in point cloud learning have enabled intelligent vehicles and robots to comprehend 3D environments better. However, processing large-scale 3D scenes remains a challenging problem, such that efficient downsampling methods play a crucial role in point cloud learning. Existing downsampling methods either require a huge computational burden or sacrifice fine-grained geometric information. For such purpose, this paper presents an advanced sampler that achieves both high accuracy and efficiency. The proposed method utilizes voxel centroid sampling as a foundation but effectively addresses the challenges regarding voxel size determination and the preservation of critical geometric cues. Specifically, we propose a Voxel Adaptation Module that adaptively adjusts voxel sizes with the reference of point-based downsampling ratio. This ensures that the sampling results exhibit a favorable distribution for comprehending various 3D objects or scenes. Meanwhile, we introduce a network compatible with arbitrary voxel sizes for sampling and feature extraction while maintaining high efficiency. The proposed approach is demonstrated with 3D object detection and 3D semantic segmentation. Compared to existing state-of-the-art methods, our approach achieves better accuracy on outdoor and indoor large-scale datasets, e.g. Waymo and ScanNet, with promising efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, et al., “Scalability in perception for autonomous driving: Waymo open dataset,” in CVPR, 2020, pp. 2446–2454.
  2. H. Wang, C. Shi, S. Shi, M. Lei, S. Wang, D. He, B. Schiele, and L. Wang, “Dsvt: Dynamic sparse voxel transformer with rotated sets,” in CVPR, 2023.
  3. H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V. Koltun, “Point transformer,” in ICCV, 2021, pp. 16 259–16 268.
  4. C. Park, Y. Jeong, M. Cho, and J. Park, “Fast point transformer,” in CVPR, 2022, pp. 16 949–16 958.
  5. R. Zhang, L. Wang, Y. Wang, P. Gao, H. Li, and J. Shi, “Starting from non-parametric networks for 3d point cloud analysis,” in CVPR, 2023, pp. 5344–5353.
  6. Y. Chen, J. Liu, X. Zhang, X. Qi, and J. Jia, “Largekernel3d: Scaling up kernels in 3d sparse cnns,” in CVPR, 2023, pp. 13 488–13 498.
  7. O. Dovrat, I. Lang, and S. Avidan, “Learning to sample,” in CVPR, 2019, pp. 2760–2769.
  8. C. Wu, J. Zheng, J. Pfrommer, and J. Beyerer, “Attention-based point cloud edge sampling,” in CVPR, 2023, pp. 5333–5343.
  9. H. Qi, C. Feng, Z. Cao, F. Zhao, and Y. Xiao, “P2b: Point-to-box network for 3d object tracking in point clouds,” in CVPR, 2020, pp. 6329–6338.
  10. Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, and A. Markham, “Randla-net: Efficient semantic segmentation of large-scale point clouds,” in CVPR, 2020.
  11. G. Qian, Y. Li, H. Peng, J. Mai, H. Hammoud, M. Elhoseiny, and B. Ghanem, “Pointnext: Revisiting pointnet++ with improved training and scaling strategies,” NeurIPS, vol. 35, pp. 23 192–23 204, 2022.
  12. C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” NeurIPS, vol. 30, 2017.
  13. F. Zhang, J. Fang, B. Wah, and P. Torr, “Deep fusionnet for point cloud semantic segmentation,” in ECCV.   Springer, 2020, pp. 644–663.
  14. I. Lang, A. Manor, and S. Avidan, “Samplenet: Differentiable point cloud sampling,” in CVPR, 2020, pp. 7578–7588.
  15. C. Choy, J. Gwak, and S. Savarese, “4d spatio-temporal convnets: Minkowski convolutional neural networks,” in CVPR, 2019, pp. 3075–3084.
  16. S. Contributors, “Spconv: Spatially sparse convolution library,” https://github.com/traveller59/spconv, 2022.
  17. C. He, R. Li, S. Li, and L. Zhang, “Voxel set transformer: A set-to-set approach to 3d object detection from point clouds,” in CVPR, 2022, pp. 8417–8427.
  18. X. Ma, C. Qin, H. You, H. Ran, and Y. Fu, “Rethinking network design and local geometry in point cloud: A simple residual MLP framework,” in ICLR, 2022. [Online]. Available: https://openreview.net/forum?id=3Pbra-˙u76D
  19. M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, and S.-M. Hu, “Pct: Point cloud transformer,” Computational Visual Media, vol. 7, no. 2, pp. 187–199, 2021.
  20. X. Wu, Y. Lao, L. Jiang, X. Liu, and H. Zhao, “Point transformer v2: Grouped vector attention and partition-based pooling,” NeurIPS, vol. 35, pp. 33 330–33 342, 2022.
  21. B. Graham, M. Engelcke, and L. Van Der Maaten, “3d semantic segmentation with submanifold sparse convolutional networks,” in CVPR, 2018, pp. 9224–9232.
  22. P.-S. Wang, “Octformer: Octree-based transformers for 3D point clouds,” ACM Transactions on Graphics (SIGGRAPH), vol. 42, no. 4, 2023.
  23. H. Tang, Z. Liu, S. Zhao, Y. Lin, J. Lin, H. Wang, and S. Han, “Searching efficient 3d architectures with sparse point-voxel convolution,” in ECCV.   Springer, 2020, pp. 685–702.
  24. C. Zhang, H. Wan, X. Shen, and Z. Wu, “Pvt: Point-voxel transformer for point cloud learning,” International Journal of Intelligent Systems, vol. 37, no. 12, pp. 11 985–12 008, 2022.
  25. Z. Liu, H. Tang, Y. Lin, and S. Han, “Point-voxel cnn for efficient 3d deep learning,” in NeurIPS, 2019.
  26. S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li, “Pv-rcnn: Point-voxel feature set abstraction for 3d object detection,” in CVPR, 2020, pp. 10 529–10 538.
  27. J. Ouyang, X. Liu, and H. Chen, “Hierarchical adaptive voxel-guided sampling for real-time applications in large-scale point clouds,” arXiv preprint arXiv:2305.14306, 2023.
  28. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” NeurIPS, vol. 30, 2017.
  29. A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “ScanNet: Richly-annotated 3D reconstructions of indoor scenes,” in CVPR, 2017.
  30. W. Wu, Z. Qi, and L. Fuxin, “Pointconv: Deep convolutional networks on 3d point clouds,” in CVPR, 2019.
  31. X. Yan, C. Zheng, Z. Li, S. Wang, and S. Cui, “Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling,” in CVPR, 2020.
  32. H.-Y. Chiang, Y.-L. Lin, Y.-C. Liu, and W. H. Hsu, “A unified point-based framework for 3d segmentation,” in 3DV, 2019.
  33. H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, “Kpconv: Flexible and deformable convolution for point clouds,” in ICCV, 2019.
  34. C.-Y. Hong, Y.-Y. Chou, and T.-L. Liu, “Attention discriminant sampling for point clouds,” in ICCV, 2023, pp. 14 429–14 440.
  35. X. Lai, J. Liu, L. Jiang, L. Wang, H. Zhao, S. Liu, X. Qi, and J. Jia, “Stratified transformer for 3d point cloud segmentation,” in CVPR, 2022.
  36. L. Yi, V. G. Kim, D. Ceylan, I.-C. Shen, M. Yan, H. Su, C. Lu, Q. Huang, A. Sheffer, and L. Guibas, “A scalable active framework for region annotation in 3d shape collections,” ACM TOG, 2016.
  37. T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3d object detection and tracking,” in CVPR, 2021, pp. 11 784–11 793.
  38. Y. Yan, Y. Mao, and B. Li, “Second: Sparsely embedded convolutional detection,” Sensors, vol. 18, no. 10, p. 3337, 2018.
  39. Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,” ACM TOG, 2019.
  40. T. Xiang, C. Zhang, Y. Song, J. Yu, and W. Cai, “Walk in the cloud: Learning curves for point clouds shape analysis,” in ICCV, 2021, pp. 915–924.
  41. G. Qian, H. Hammoud, G. Li, A. Thabet, and B. Ghanem, “Assanet: An anisotropic separable set abstraction for efficient point cloud representation learning,” NeurIPS, vol. 34, 2021.
  42. J. Park, S. Lee, S. Kim, Y. Xiong, and H. J. Kim, “Self-positioning point-based transformer for point cloud understanding,” in CVPR, 2023, pp. 21 814–21 823.
  43. A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” in CVPR, 2019, pp. 12 697–12 705.
  44. S. Shi, Z. Wang, J. Shi, X. Wang, and H. Li, “From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network,” IEEE TPAMI, vol. 43, no. 8, pp. 2647–2664, 2020.
  45. S. Shi, L. Jiang, J. Deng, Z. Wang, C. Guo, J. Shi, X. Wang, and H. Li, “Pv-rcnn++: Point-voxel feature set abstraction with local vector representation for 3d object detection,” IJCV, pp. 1–21, 2022.
  46. L. Fan, Z. Pang, T. Zhang, Y.-X. Wang, H. Zhao, F. Wang, N. Wang, and Z. Zhang, “Embracing single stride 3d object detector with sparse transformer,” in CVPR, 2022.
  47. C. He, R. Li, S. Li, and L. Zhang, “Voxel set transformer: A set-to-set approach to 3d object detection from point clouds,” in CVPR, 2022.
  48. G. Shi, R. Li, and C. Ma, “Pillarnet: Real-time and high-performance pillar-based 3d object detection,” in ECCV, 2022.
  49. Z. Zhou, X. Zhao, Y. Wang, P. Wang, and H. Foroosh, “Centerformer: Center-based transformer for 3d object detection,” in ECCV, 2022.
  50. Z. Liu, X. Yang, H. Tang, S. Yang, and S. Han, “Flatformer: Flattened window attention for efficient point cloud transformer,” in CVPR, 2023, pp. 1200–1211.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.