3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation (2003.13867v1)

Published 30 Mar 2020 in cs.CV

Abstract: We present 3D-MPA, a method for instance segmentation on 3D point clouds. Given an input point cloud, we propose an object-centric approach where each point votes for its object center. We sample object proposals from the predicted object centers. Then, we learn proposal features from grouped point features that voted for the same object center. A graph convolutional network introduces inter-proposal relations, providing higher-level feature learning in addition to the lower-level point features. Each proposal comprises a semantic label, a set of associated points over which we define a foreground-background mask, an objectness score and aggregation features. Previous works usually perform non-maximum-suppression (NMS) over proposals to obtain the final object detections or semantic instances. However, NMS can discard potentially correct predictions. Instead, our approach keeps all proposals and groups them together based on the learned aggregation features. We show that grouping proposals improves over NMS and outperforms previous state-of-the-art methods on the tasks of 3D object detection and semantic instance segmentation on the ScanNetV2 benchmark and the S3DIS dataset.

Citations (198)

View on Semantic Scholar

Summary

The paper introduces 3D-MPA, a method that leverages multi proposal aggregation and graph convolutional networks to improve 3D instance segmentation.
It demonstrates significant performance gains over state-of-the-art methods, achieving 64.2 mAP@25% and 49.2 mAP@50% on the ScanNetV2 dataset.
The approach integrates top-down and bottom-up strategies, efficiently handling object proposals and enhancing real-time scene understanding.

Essay on "3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation"

The paper introduces a novel method for instance segmentation in 3D point clouds, named 3D-MPA (Multi Proposal Aggregation). The primary objective of 3D-MPA is to enhance the segmentation of semantic instances within 3D data, particularly leveraging the capabilities of sparse 3D point clouds derived from RGB-D sensors. This method adopts an object-centric approach that addresses some of the fundamental limitations of prior techniques in the domain of 3D instance segmentation.

Technical Summary

3D-MPA emphasizes an object-centric technique wherein each point within a point cloud votes for its respective object center. This approach is a significant deviation from traditional non-maximum suppression (NMS) methods that often prune possibly correct predictions. Instead of dismissing predictions through NMS, the MPA strategy aggregates multiple instance proposals, leveraging a graph convolutional network (GCN). The GCN serves to incorporate higher-level interactions between adjacent proposals, facilitating a more comprehensive representation and interaction of features. This robust interaction results in the aggregation of proposals into final object instances, underscoring the effectiveness of retaining and refining proposals instead of excluding them prematurely.

Notably, the paper includes a thorough evaluation of the proposed method against state-of-the-art techniques on benchmarks such as ScanNetV2 and S3DIS datasets. The results indicate a significant performance gain in both 3D object detection and semantic instance segmentation, achieving improvements over existing techniques like VoteNet. Specifically, in the ScanNetV2 dataset, 3D-MPA achieves an impressive 64.2 mAP@25% and 49.2 mAP@50%, surpassing previous methods considerably.

Key Contributions

The contributions of this paper are manifold:

Hybrid Approach: 3D-MPA integrates the benefits of both top-down and bottom-up approaches, facilitating robust object detection by allowing objects to receive multiple proposals while maintaining computational efficiency.
Graph Convolutional Network: The incorporation of a GCN enables enhanced interaction between proposals, enriching the feature set available for instance prediction.
Proposal Aggregation over NMS: A departure from traditional NMS, the aggregation of proposals based on learned features provides superior precision and recall metrics, as evidenced by the empirical results.
Efficiency: The proposed method efficiently handles outlier proposals and shows computational advantages due to reduced proposal numbers compared to raw input points.

Implications and Future Direction

The implications of 3D-MPA are profound for practical applications in robotics, augmented reality, and any domain where understanding 3D environments is crucial. The method’s ability to precisely distinguish and group proposals into meaningful instances opens new avenues for more sophisticated real-time scene understanding.

Furthermore, the aggregation-based approach may hold promise for integrating with tracking systems in dynamic environments, leveraging the method’s robust proposal generation and interaction capabilities. The paper also hints at the intriguing possibility of extending this approach to 4D proposals, integrating temporal changes—an area ripe for exploration in semi-dynamic sequence processing.

In conclusion, 3D-MPA spearheads a methodologically innovative and practically impactful strategy in 3D instance segmentation, offering a fertile ground for further exploration in enriching 3D scene understanding beyond static datasets.