MASC: Multi-scale Affinity with Sparse Convolution for 3D Instance Segmentation (1902.04478v1)

Published 12 Feb 2019 in cs.CV

Abstract: We propose a new approach for 3D instance segmentation based on sparse convolution and point affinity prediction, which indicates the likelihood of two points belonging to the same instance. The proposed network, built upon submanifold sparse convolution [3], processes a voxelized point cloud and predicts semantic scores for each occupied voxel as well as the affinity between neighboring voxels at different scales. A simple yet effective clustering algorithm segments points into instances based on the predicted affinity and the mesh topology. The semantic for each instance is determined by the semantic prediction. Experiments show that our method outperforms the state-of-the-art instance segmentation methods by a large margin on the widely used ScanNet benchmark [2]. We share our code publicly at https://github.com/art-programmer/MASC.

Citations (79)

View on Semantic Scholar

Summary

The paper introduces a novel U-Net framework that combines submanifold sparse convolution with multi-scale affinity predictions to efficiently segment 3D voxelized point clouds.
The method achieves a state-of-the-art average AP of 0.447 on the ScanNet benchmark, substantially outperforming previous techniques.
The work offers practical insights for robotics and autonomous systems by enabling precise and efficient real-time 3D scene understanding through advanced clustering algorithms.

Overview of MASC: Multi-scale Affinity with Sparse Convolution for 3D Instance Segmentation

The paper presents a compelling approach to 3D instance segmentation by integrating sparse convolution techniques with multi-scale affinity prediction. The authors introduce a novel framework, termed MASC, which adeptly processes voxelized point clouds to predict semantic scores and affinity between neighboring voxels, ultimately facilitating accurate and efficient instance segmentation.

Methodology

The approach employs the submanifold sparse convolution architecture as the foundation for processing extensive point cloud data. Through the utilization of sparse convolution networks, the network efficiently handles voxelized data obtained from indoor scenes, predicting semantic labels and affinities between voxels across varying scales. A distinctive clustering algorithm is introduced, leveraging the predicted affinities and mesh topology to group points into instances.

The network architecture follows a U-Net design, which is optimized for the task by utilizing skip connections and features at multiple spatial resolutions. Semantic predictions are handled via a dedicated branch with fully connected layers, while multiple affinity branches predict similarities between voxels across different scales. The use of scales in affinity prediction, specifically denoted as scale zero and higher, allows the model to capture affinities effectively throughout the spatial hierarchy of the data.

A notable feature of the clustering algorithm is its aggressive parallel nature, mapping nodes based on affinity and updating them iteratively until stabilization. The algorithm intelligently merges nodes into clusters by exploiting both affinity scores and connectivity within the mesh topology.

Results

The methodology demonstrates superior performance on the ScanNet benchmark compared to existing state-of-the-art methods. The authors report that their approach achieves an average accuracy (AP) of 0.447, outperforming other methods substantially across various object categories. This indicates the robustness of MASC in handling diverse 3D instance segmentation tasks, particularly in complex indoor environments.

From the qualitative results presented, MASC shows adeptness at discerning instance boundaries despite the inherent irregularity of 3D data and the challenges posed by occlusions and varying object orientations.

Implications and Future Directions

The proposed technique offers significant implications for practical applications in autonomous systems, robotics, and real-time 3D data processing tasks where instance identification is crucial. The method's efficiency in processing large-scale point clouds positions it as a formidable tool in advancing 3D scene understanding capabilities.

Potential future developments could explore the scalability of this framework to outdoor environments or multi-object scenarios, enhancing its adaptability and precision in a wider array of contexts. Additionally, further exploration into the role of different scales in affinity prediction could yield insights into optimizing clustering algorithms further, perhaps even integrating end-to-end training capabilities with back-propagation on GPUs.

The authors also recognise the necessity for addressing specific limitations, such as the implementation speed of the clustering algorithm and instances of co-planar object segmentation difficulties. Such refinements may drive subsequent iterations of this framework to higher efficient thresholds, potentially achieving real-time performance in dynamic settings.

Conclusion

This research marks an advancement in employing sparse convolution techniques for 3D instance segmentation, showcasing the MASC framework's capabilities in multi-scale affinity predictions and efficient clustering. The promising results benchmarked on ScanNet affirm the efficacy of MASC and highlight potential areas for future research and practical implementations in the domain of computer vision and 3D data processing.

PDF Markdown

Related Papers

GitHub

GitHub - art-programmer/MASC (58 stars)