Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DepthSSC: Monocular 3D Semantic Scene Completion via Depth-Spatial Alignment and Voxel Adaptation (2311.17084v2)

Published 28 Nov 2023 in cs.CV

Abstract: The task of 3D semantic scene completion using monocular cameras is gaining significant attention in the field of autonomous driving. This task aims to predict the occupancy status and semantic labels of each voxel in a 3D scene from partial image inputs. Despite numerous existing methods, many face challenges such as inaccurately predicting object shapes and misclassifying object boundaries. To address these issues, we propose DepthSSC, an advanced method for semantic scene completion using only monocular cameras. DepthSSC integrates the Spatial Transformation Graph Fusion (ST-GF) module with Geometric-Aware Voxelization (GAV), enabling dynamic adjustment of voxel resolution to accommodate the geometric complexity of 3D space. This ensures precise alignment between spatial and depth information, effectively mitigating issues such as object boundary distortion and incorrect depth perception found in previous methods. Evaluations on the SemanticKITTI and SSCBench-KITTI-360 dataset demonstrate that DepthSSC not only captures intricate 3D structural details effectively but also achieves state-of-the-art performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9297–9307, 2019.
  2. Adabins: Depth estimation using adaptive bins. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4009–4018, 2021.
  3. Anh-Quan Cao and Raoul de Charette. Monoscene: Monocular 3d semantic scene completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3991–4001, 2022.
  4. S3cnet: A sparse semantic scene completion network for lidar point clouds. In Conference on Robot Learning, pages 2148–2161. PMLR, 2021.
  5. Two stream 3d semantic scene completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
  6. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  7. Tri-perspective view for vision-based 3d semantic occupancy prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9223–9232, 2023.
  8. Genetic k-means algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 29(3):433–439, 1999.
  9. Rgbd based dimensional decomposition residual network for 3d semantic scene completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7693–7702, 2019a.
  10. Depth based semantic scene completion with position importance aware loss. IEEE Robotics and Automation Letters, 5(1):219–226, 2019b.
  11. Anisotropic convolutional networks for 3d semantic scene completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3351–3359, 2020.
  12. Sscbench: A large-scale 3d semantic scene completion benchmark for autonomous driving. arXiv preprint arXiv:2306.09001, 2023a.
  13. Voxformer: Sparse voxel transformer for camera-based 3d semantic scene completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9087–9098, 2023b.
  14. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In European conference on computer vision, pages 1–18. Springer, 2022.
  15. Occdepth: A depth-aware method for 3d semantic scene completion. arXiv preprint arXiv:2302.13540, 2023.
  16. A survey of the marching cubes algorithm. Computers & Graphics, 30(5):854–879, 2006.
  17. Semantic scene completion using local deep implicit functions on lidar data. IEEE transactions on pattern analysis and machine intelligence, 44(10):7205–7218, 2021.
  18. Lmscnet: Lightweight multiscale 3d semantic completion. In 2020 International Conference on 3D Vision (3DV), pages 111–119. IEEE, 2020.
  19. Semantic scene completion from a single depth image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1746–1754, 2017.
  20. Occ3d: A large-scale 3d occupancy prediction benchmark for autonomous driving. arXiv preprint arXiv:2304.14365, 2023.
  21. Forknet: Multi-branch volumetric semantic completion from a single depth image. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8608–8617, 2019.
  22. Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 21729–21740, 2023.
  23. Sparse single sweep lidar point cloud segmentation via learning contextual shape priors from scene completion. In AAAI, 2021a.
  24. Sparse single sweep lidar point cloud segmentation via learning contextual shape priors from scene completion. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3101–3109, 2021b.
  25. Dynamicbev: Leveraging dynamic queries and temporal context for 3d object detection. arXiv preprint arXiv:2310.05989, 2023.
  26. Ndc-scene: Boost monocular 3d semantic scene completion in normalized device coordinates space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9455–9465, 2023a.
  27. Geometry-guided ray augmentation for neural surface reconstruction with sparse views. arXiv preprint arXiv:2310.05483, 2023b.
  28. Improving depth gradient continuity in transformers: A comparative study on monocular depth estimation with cnn. arXiv preprint arXiv:2308.08333, 2023c.
  29. Cascaded context pyramid for full-resolution 3d semantic scene completion. In CVPR, 2019.
  30. Occformer: Dual-path transformer for vision-based 3d semantic occupancy prediction. arXiv preprint arXiv:2304.05316, 2023.
Citations (10)

Summary

We haven't generated a summary for this paper yet.