Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion (2306.15349v1)

Published 27 Jun 2023 in cs.CV and cs.AI

Abstract: Semantic scene completion (SSC) jointly predicts the semantics and geometry of the entire 3D scene, which plays an essential role in 3D scene understanding for autonomous driving systems. SSC has achieved rapid progress with the help of semantic context in segmentation. However, how to effectively exploit the relationships between the semantic context in semantic segmentation and geometric structure in scene completion remains under exploration. In this paper, we propose to solve outdoor SSC from the perspective of representation separation and BEV fusion. Specifically, we present the network, named SSC-RS, which uses separate branches with deep supervision to explicitly disentangle the learning procedure of the semantic and geometric representations. And a BEV fusion network equipped with the proposed Adaptive Representation Fusion (ARF) module is presented to aggregate the multi-scale features effectively and efficiently. Due to the low computational burden and powerful representation ability, our model has good generality while running in real-time. Extensive experiments on SemanticKITTI demonstrate our SSC-RS achieves state-of-the-art performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser, “Semantic scene completion from a single depth image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1746–1754.
  2. H. Zou, X. Yang, T. Huang, C. Zhang, Y. Liu, W. Li, F. Wen, and H. Zhang, “Up-to-down network: Fusing multi-scale context for 3d semantic scene completion,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 16–23.
  3. L. Roldão, R. de Charette, and A. Verroust-Blondet, “Lmscnet: Lightweight multiscale 3d semantic completion,” in 3DV 2020-International Virtual Conference on 3D Vision, 2020.
  4. X. Yan, J. Gao, J. Li, R. Zhang, Z. Li, R. Huang, and S. Cui, “Sparse single sweep lidar point cloud segmentation via learning contextual shape priors from scene completion,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, 2021, pp. 3101–3109.
  5. X. Yang, H. Zou, X. Kong, T. Huang, Y. Liu, W. Li, F. Wen, and H. Zhang, “Semantic segmentation-assisted scene completion for lidar point clouds,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 3555–3562.
  6. R. Cheng, C. Agia, Y. Ren, X. Li, and L. Bingbing, “S3cnet: A sparse semantic scene completion network for lidar point clouds,” in Conference on Robot Learning.   PMLR, 2021, pp. 2148–2161.
  7. M. Garbade, Y.-T. Chen, J. Sawatzky, and J. Gall, “Two stream 3d semantic scene completion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0.
  8. E. E. Aksoy, S. Baci, and S. Cavdar, “Salsanet: Fast road and vehicle segmentation in lidar point clouds for autonomous driving,” in 2020 IEEE Intelligent Vehicles Symposium (IV).   IEEE, 2019, pp. 926–932.
  9. Y. Zhang, Z. Zhou, P. David, X. Yue, Z. Xi, B. Gong, and H. Foroosh, “Polarnet: An improved grid representation for online lidar point clouds semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9601–9610.
  10. J. Li, X. He, Y. Wen, Y. Gao, X. Cheng, and D. Zhang, “Panoptic-phnet: Towards real-time and high-precision lidar panoptic segmentation via clustering pseudo heatmap,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 809–11 818.
  11. A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 697–12 705.
  12. Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” arXiv preprint arXiv:2205.13542, 2022.
  13. Z. Zhou, Y. Zhang, and H. Foroosh, “Panoptic-polarnet: Proposal-free lidar point cloud panoptic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13 194–13 203.
  14. B. Graham, M. Engelcke, and L. Van Der Maaten, “3d semantic segmentation with submanifold sparse convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9224–9232.
  15. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in European conference on computer vision.   Springer, 2012, pp. 746–760.
  16. A. Dai, D. Ritchie, M. Bokeloh, S. Reed, J. Sturm, and M. Nießner, “Scancomplete: Large-scale scene completion and semantic segmentation for 3d scans,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4578–4587.
  17. Y. Wang, D. J. Tan, N. Navab, and F. Tombari, “Forknet: Multi-branch volumetric semantic completion from a single depth image,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8608–8617.
  18. S. Liu, Y. Hu, Y. Zeng, Q. Tang, B. Jin, Y. Han, and X. Li, “See and think: Disentangling semantic scene completion,” in Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, pp. 261–272.
  19. J. Li, Y. Liu, D. Gong, Q. Shi, X. Yuan, C. Zhao, and I. Reid, “Rgbd based dimensional decomposition residual network for 3d semantic scene completion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7693–7702.
  20. Y. Liu, J. Li, Q. Yan, X. Yuan, C. Zhao, I. Reid, and C. Cadena, “3d gated recurrent fusion for semantic scene completion,” arXiv preprint arXiv:2002.07269, 2020.
  21. J. Li, K. Han, P. Wang, Y. Liu, and X. Yuan, “Anisotropic convolutional networks for 3d semantic scene completion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3351–3359.
  22. S. Li, C. Zou, Y. Li, X. Zhao, and Y. Gao, “Attention-based multi-modal fusion network for semantic scene completion,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 11 402–11 409.
  23. Y. Cai, X. Chen, C. Zhang, K.-Y. Lin, X. Wang, and H. Li, “Semantic scene completion via integrating instances and scene in-the-loop,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 324–333.
  24. X. Wang, M. R. Oswald, I. Cherabier, and M. Pollefeys, “Learning 3d semantic reconstruction on octrees,” in Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Dortmund, Germany, September 10–13, 2019, Proceedings 41.   Springer, 2019, pp. 581–594.
  25. J. Li, Y. Liu, X. Yuan, C. Zhao, R. Siegwart, I. Reid, and C. Cadena, “Depth based semantic scene completion with position importance aware loss,” IEEE Robotics and Automation Letters, vol. 5, no. 1, pp. 219–226, 2019.
  26. M. Zhong and G. Zeng, “Semantic point completion network for 3d semantic scene completion,” in ECAI 2020.   IOS Press, 2020, pp. 2824–2831.
  27. J. Tang, X. Chen, J. Wang, and G. Zeng, “Not all voxels are equal: Semantic scene completion from the point-voxel perspective,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 2352–2360.
  28. T. Huang, H. Zou, J. Cui, X. Yang, M. Wang, X. Zhao, J. Zhang, Y. Yuan, Y. Xu, and Y. Liu, “Rfnet: recurrent forward network for dense point cloud completion,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 12 508–12 517.
  29. X. Zhang, Y. Feng, S. Li, C. Zou, H. Wan, X. Zhao, Y. Guo, and Y. Gao, “View-guided point cloud completion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15 890–15 899.
  30. C. B. Rist, D. Emmerichs, M. Enzweiler, and D. M. Gavrila, “Semantic scene completion using local deep implicit functions on lidar data,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 10, pp. 7205–7218, 2021.
  31. J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 9297–9307.
  32. J. Zhang, H. Zhao, A. Yao, Y. Chen, L. Zhang, and H. Liao, “Efficient semantic scene completion network with spatial group convolution,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 733–749.
  33. J. Xu, X. Li, Y. Tang, Q. Yu, Y. Hao, L. Hu, and M. Chen, “Casfusionnet: A cascaded network for point cloud semantic scene completion by dense feature fusion,” arXiv preprint arXiv:2211.13702, 2022.
  34. H. Li, C. Sima, J. Dai, W. Wang, L. Lu, H. Wang, E. Xie, Z. Li, H. Deng, H. Tian et al., “Delving into the devils of bird’s-eye-view perception: A review, evaluation and recipe,” arXiv preprint arXiv:2209.05324, 2022.
  35. Z. Zhou, X. Zhao, Y. Wang, P. Wang, and H. Foroosh, “Centerformer: Center-based transformer for 3d object detection,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII.   Springer, 2022, pp. 496–513.
  36. Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, and J. Dai, “Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IX.   Springer, 2022, pp. 1–18.
  37. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  38. S. Xu, R. Wan, M. Ye, X. Zou, and T. Cao, “Sparse cross-scale attention network for efficient lidar panoptic segmentation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, 2022, pp. 2920–2928.
  39. M. Berman, A. Rannen Triki, and M. B. Blaschko, “The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4413–4421.
  40. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE conference on computer vision and pattern recognition.   IEEE, 2012, pp. 3354–3361.
  41. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
Citations (17)

Summary

We haven't generated a summary for this paper yet.