JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds (2007.06888v1)

Published 14 Jul 2020 in cs.CV

Abstract: Semantic segmentation and semantic edge detection can be seen as two dual problems with close relationships in computer vision. Despite the fast evolution of learning-based 3D semantic segmentation methods, little attention has been drawn to the learning of 3D semantic edge detectors, even less to a joint learning method for the two tasks. In this paper, we tackle the 3D semantic edge detection task for the first time and present a new two-stream fully-convolutional network that jointly performs the two tasks. In particular, we design a joint refinement module that explicitly wires region information and edge information to improve the performances of both tasks. Further, we propose a novel loss function that encourages the network to produce semantic segmentation results with better boundaries. Extensive evaluations on S3DIS and ScanNet datasets show that our method achieves on par or better performance than the state-of-the-art methods for semantic segmentation and outperforms the baseline methods for semantic edge detection. Code release: https://github.com/hzykent/JSENet

Citations (119)

View on Semantic Scholar

Summary

The paper introduces a dual-task network that simultaneously performs semantic segmentation and edge detection for improved 3D scene interpretation.
The novel two-stream FCN architecture and custom dual semantic edge loss yield competitive mIoU and F-measure scores on S3DIS and ScanNet datasets.
The integrated refinement module leverages shared features to enhance semantic boundaries and mitigate noise in 3D point clouds.

Overview of JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds

The paper introduces JSENet, a novel architecture designed to address the dual tasks of semantic segmentation (SemSeg) and semantic edge detection (SemEdgeD) in 3D point cloud data. Semantic segmentation and edge detection are both critical aspects of scene understanding in computer vision, yet edge detection in 3D point clouds has not been extensively explored. JSENet simultaneously tackles these two interconnected tasks via a joint learning framework, enhancing the efficacy of each through mutual refinement.

JSENet proposes a two-stream fully-convolutional network (FCN) architecture. One stream targets semantic segmentation, while the other handles semantic edge detection. The network is grounded in the KPConv framework, which is well-suited for its efficient handling of point cloud data. A primary contribution of the paper is the introduction of a joint refinement module that exploits the intrinsic duality between segmentation and edge information, facilitating the production of precise semantic boundaries.

Key Contributions

Introduction of 3D Semantic Edge Detection: The paper pioneers the task of 3D SemEdgeD, presenting a framework that enriches point cloud data processing by detecting and refining semantic edges, which have been traditionally neglected.
JSENet Framework: The architecture includes an FCN-based approach that integrates a semantic segmentation stream and a semantic edge detection stream. This setup allows for effective joint learning, leveraging shared feature representations and benefiting each task through cross-task refinement.
Novel Loss Function: A dual semantic edge loss function is proposed. This loss function is specifically designed to improve the boundary quality of semantic segmentation outputs by enhancing the alignment with true semantic edges, which has shown to improve boundary precision.

Results and Evaluation

The empirical evaluation of JSENet was conducted on two major datasets, S3DIS and ScanNet, both of which contain richly annotated indoor 3D scenes. On these datasets, JSENet showcased its effectiveness by achieving competitive results:

Semantic Segmentation: JSENet achieved a mean Intersection over Union (mIoU) score of 67.7% on the S3DIS dataset and 69.9% on the ScanNet dataset, demonstrating superior or comparable results to state-of-the-art methods.
Semantic Edge Detection: It also excelled in semantic edge detection, outperforming baseline methods adapted from 2D edge detection frameworks. Specifically, JSENet achieved a mean maximum F-measure (MF) score of 31.0% on S3DIS, indicating substantial improvement in edge detection accuracy.

Implications and Future Directions

The dual-task learning approach of JSENet not only engenders improved performance in both segmentation and edge detection but also underscores the importance of integrating related tasks to enhance overall system capabilities. This interconnected framework sets a precedent for further exploration into joint learning methodologies in other domains of 3D computer vision, including outdoor environments and dynamic 3D scenes.

Future research could delve into optimizing the JSENet architecture to handle more diverse scenes or higher-resolution point clouds, as well as exploring its application to other relevant fields such as augmented reality and robotics. Another promising area of investigation lies in addressing noise and inaccuracies in ground truth annotations, which can affect model performance, particularly in edge detection tasks. The meticulous design of loss functions and refinement modules, as demonstrated, will continue to play a critical role in advancing the capabilities of 3D learning systems.

PDF Markdown

Related Papers

GitHub

GitHub - hzykent/JSENet: Implementation of ECCV2020 paper - JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds (123 stars)