Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Camera-based 3D Semantic Scene Completion with Sparse Guidance Network (2312.05752v2)

Published 10 Dec 2023 in cs.CV

Abstract: Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to camera-based SSC solutions due to the richer visual cues and cost-effectiveness of cameras. However, existing methods usually rely on sophisticated and heavy 3D models to process the lifted 3D features directly, which are not discriminative enough for clear segmentation boundaries. In this paper, we adopt the dense-sparse-dense design and propose a one-stage camera-based SSC framework, termed SGN, to propagate semantics from the semantic-aware seed voxels to the whole scene based on spatial geometry cues. Firstly, to exploit depth-aware context and dynamically select sparse seed voxels, we redesign the sparse voxel proposal network to process points generated by depth prediction directly with the coarse-to-fine paradigm. Furthermore, by designing hybrid guidance (sparse semantic and geometry guidance) and effective voxel aggregation for spatial geometry cues, we enhance the feature separation between different categories and expedite the convergence of semantic propagation. Finally, we devise the multi-scale semantic propagation module for flexible receptive fields while reducing the computation resources. Extensive experimental results on the SemanticKITTI and SSCBench-KITTI-360 datasets demonstrate the superiority of our SGN over existing state-of-the-art methods. And even our lightweight version SGN-L achieves notable scores of 14.80\% mIoU and 45.45\% IoU on SeamnticKITTI validation with only 12.5 M parameters and 7.16 G training memory. Code is available at https://github.com/Jieqianyu/SGN.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. SemanticKITTI: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE International Conference on Computer Vision, 9297–9307.
  2. The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4413–4421.
  3. Monoscene: Monocular 3d semantic scene completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3991–4001.
  4. Efficient and robust 2d-to-bev representation learning via geometry-guided kernel transformer. arXiv preprint arXiv:2206.04584.
  5. Polar parametrization for vision-based surround-view 3d detection. arXiv preprint arXiv:2206.10965.
  6. S3cnet: A sparse semantic scene completion network for lidar point clouds. In Conference on Robot Learning, 2148–2161. PMLR.
  7. A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving. arXiv preprint arXiv:2303.10076.
  8. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, 3354–3361. IEEE.
  9. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
  10. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790.
  11. Tri-perspective view for vision-based 3d semantic occupancy prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9223–9232.
  12. Symphonize 3D Semantic Scene Completion with Contextual Instance Queries. arXiv preprint arXiv:2306.15670.
  13. StereoScene: BEV-Assisted Stereo Matching Empowers 3D Semantic Scene Completion. arXiv preprint arXiv:2303.13959.
  14. Anisotropic convolutional networks for 3d semantic scene completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3351–3359.
  15. Lode: Locally conditioned eikonal implicit scene completion from sparse lidar. arXiv preprint arXiv:2302.14052.
  16. SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving. arXiv preprint arXiv:2306.09001.
  17. Voxformer: Sparse voxel transformer for camera-based 3d semantic scene completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9087–9098.
  18. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In European conference on computer vision, 1–18. Springer.
  19. FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation. arXiv preprint arXiv:2307.01492.
  20. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2117–2125.
  21. Petr: Position embedding transformation for multi-view 3d object detection. In European Conference on Computer Vision, 531–548. Springer.
  22. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
  23. SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion. arXiv preprint arXiv:2306.15349.
  24. Occdepth: A depth-aware method for 3d semantic scene completion. arXiv preprint arXiv:2302.13540.
  25. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, 194–210. Springer.
  26. Semantic scene completion using local deep implicit functions on lidar data. IEEE transactions on pattern analysis and machine intelligence, 44(10): 7205–7218.
  27. LMSCNet: Lightweight Multiscale 3D Semantic Completion. In 3DV 2020-International Virtual Conference on 3D Vision.
  28. Mobilestereonet: Towards lightweight deep networks for stereo matching. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2417–2426.
  29. Semantic scene completion from a single depth image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1746–1754.
  30. Occ3d: A large-scale 3d occupancy prediction benchmark for autonomous driving. arXiv preprint arXiv:2304.14365.
  31. Fcos3d: Fully convolutional one-stage monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 913–922.
  32. Openoccupancy: A large scale benchmark for surrounding semantic occupancy perception. arXiv preprint arXiv:2303.03991.
  33. Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on Robot Learning, 180–191. PMLR.
  34. Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving. arXiv preprint arXiv:2303.09551.
  35. SCPNet: Semantic Scene Completion on Point Cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 17642–17651.
  36. Sparse single sweep lidar point cloud segmentation via learning contextual shape priors from scene completion. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 3101–3109.
  37. Semantic segmentation-assisted scene completion for lidar point clouds. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 3555–3562. IEEE.
  38. Efficient Point Cloud Segmentation with Geometry-Aware Sparse Networks. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIX, 196–212. Springer.
  39. Drinet: A dual-representation iterative learning network for point cloud segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, 7447–7456.
  40. Efficient semantic scene completion network with spatial group convolution. In Proceedings of the European Conference on Computer Vision (ECCV), 733–749.
  41. OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction. arXiv preprint arXiv:2304.05316.
  42. Cross-view transformers for real-time map-view semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 13760–13769.
  43. Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation. arXiv preprint arXiv:2008.01550.
  44. Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 16–23. IEEE.
Citations (13)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com