Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion (2012.03762v1)

Published 7 Dec 2020 in cs.CV

Abstract: LiDAR point cloud analysis is a core task for 3D computer vision, especially for autonomous driving. However, due to the severe sparsity and noise interference in the single sweep LiDAR point cloud, the accurate semantic segmentation is non-trivial to achieve. In this paper, we propose a novel sparse LiDAR point cloud semantic segmentation framework assisted by learned contextual shape priors. In practice, an initial semantic segmentation (SS) of a single sweep point cloud can be achieved by any appealing network and then flows into the semantic scene completion (SSC) module as the input. By merging multiple frames in the LiDAR sequence as supervision, the optimized SSC module has learned the contextual shape priors from sequential LiDAR data, completing the sparse single sweep point cloud to the dense one. Thus, it inherently improves SS optimization through fully end-to-end training. Besides, a Point-Voxel Interaction (PVI) module is proposed to further enhance the knowledge fusion between SS and SSC tasks, i.e., promoting the interaction of incomplete local geometry of point cloud and complete voxel-wise global structure. Furthermore, the auxiliary SSC and PVI modules can be discarded during inference without extra burden for SS. Extensive experiments confirm that our JS3C-Net achieves superior performance on both SemanticKITTI and SemanticPOSS benchmarks, i.e., 4% and 3% improvement correspondingly.

Authors (7)

Xu Yan (130 papers)
Jiantao Gao (8 papers)
Jie Li (553 papers)
Ruimao Zhang (84 papers)
Zhen Li (334 papers)
Rui Huang (129 papers)
Shuguang Cui (275 papers)

Citations (239)

View on Semantic Scholar

Summary

The paper introduces JS3C-Net, which integrates sparse segmentation and semantic scene completion to learn contextual shape priors from single sweep LiDAR data.
The architecture employs a Point-Voxel Interaction module and uncertainty-weighted multi-task learning to enhance segmentation accuracy in challenging sparse environments.
Experimental results on SemanticKITTI and SemanticPOSS demonstrate mIoU improvements and better detection of under-represented classes, highlighting its real-time potential.

Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion

LiDAR point cloud analysis is an essential component in 3D computer vision, reaching extensive applications, especially in autonomous driving. However, single sweep LiDAR data presents significant challenges due to inherent sparsity and noise. The paper "Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion" proposes a novel framework, JS3C-Net, that enhances semantic segmentation by integrating learned contextual shape priors from semantic scene completion.

Technical Contributions and Methodology

Framework Architecture: The JS3C-Net leverages a sparse convolution U-Net for the initial semantic segmentation. The segmentation result feeds into a Semantic Scene Completion (SSC) module, applying supervision from merged multiple frame sequences. This SSC module's completion supports the preliminary segmentation model by providing contextual shape priors, transforming sparse point clouds into dense ones.
Point-Voxel Interaction Module: A unique component is the Point-Voxel Interaction (PVI) module, promoting interaction between local and global LiDAR data representations. This module facilitates knowledge transfer by connecting incomplete local geometries with complete voxel-wise structures.
Classification and Completion Optimization: Both the segmentation and scene completion tasks utilize uncertainty-weighted multi-task learning, balancing each task's contribution for simultaneous improvements in sparse data segmentation.
Efficiency: A crucial benefit of this architecture is that additional modules (SSC and PVI) are redundant during inference, ensuring no additional computational load for real-time applications.

Experimental Results

JS3C-Net demonstrates significant improvements in semantic segmentation across benchmarks such as SemanticKITTI and SemanticPOSS, showing mIoU improvements of 4% and 3%, respectively. Notably, the framework achieves superior performance on small, often under-represented object classes like motorcycles and bicycles, providing a stark contrast to previous methods that struggle with segmentation in sparsely populated LiDAR data.

Implications and Future Directions

The introduction of contextual shape priors from temporal LiDAR sequences into sparse point cloud segmentation clarifies the impact of leveraging sequential data in real-time systems. While this framework primarily addresses challenges in autonomous driving, the approach is adaptable to various 3D point cloud applications, potentially leading to enhanced indoor and outdoor scene understanding.

Future directions might explore integrating this segmentation-completion framework with more types of auxiliary data sources like imagery or radar to further optimize segmentation performance. Another potential development path involves the scaling of this architecture to handle higher resolutions without increasing computational demand, thus expanding applicability in more complex and dynamic environments.

In conclusion, JS3C-Net presents a significant step in LiDAR data processing, showcasing the practicality of integrating semantic scene completion to alleviate the inherent limitations of sparse single sweep point clouds.

PDF Markdown

Related Papers

GitHub

GitHub - yanx27/JS3C-Net: Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion (AAAI 2021) (217 stars)