PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image (1812.04072v2)

Published 10 Dec 2018 in cs.CV

Abstract: This paper proposes a deep neural architecture, PlaneRCNN, that detects and reconstructs piecewise planar surfaces from a single RGB image. PlaneRCNN employs a variant of Mask R-CNN to detect planes with their plane parameters and segmentation masks. PlaneRCNN then jointly refines all the segmentation masks with a novel loss enforcing the consistency with a nearby view during training. The paper also presents a new benchmark with more fine-grained plane segmentations in the ground-truth, in which, PlaneRCNN outperforms existing state-of-the-art methods with significant margins in the plane detection, segmentation, and reconstruction metrics. PlaneRCNN makes an important step towards robust plane extraction, which would have an immediate impact on a wide range of applications including Robotics, Augmented Reality, and Virtual Reality.

Citations (212)

View on Semantic Scholar

Summary

The paper introduces a novel architecture that adapts Mask R-CNN for detecting and reconstructing 3D planar surfaces from one RGB image.
It utilizes a segmentation refinement network and a warping-loss module to ensure coherent plane boundaries and precise geometric reconstruction.
Benchmarking on a 100,000-image ScanNet dataset shows enhanced performance with an average of 14.7 planes per image, surpassing previous approaches.

PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image

The paper presents PlaneRCNN, a deep neural architecture designed for detecting and reconstructing planar surfaces from a single RGB image. This work builds upon the capabilities of Mask R-CNN by addressing its limitations in plane detection and reconstruction, ultimately proposing a comprehensive system for 3D scene understanding through piecewise planar reconstruction.

Overview of PlaneRCNN Components

The architecture of PlaneRCNN is structured around three primary components:

Plane Detection Network: The authors have adapted Mask R-CNN to perform instance-level segmentation for planar regions. Beyond identifying planar masks, this network estimates plane normals and per-pixel depth values, leveraging camera intrinsics to further reconstruct planar surfaces in 3D.
Segmentation Refinement Network: PlaneRCNN introduces a refinement network that addresses the independent prediction of segmentation masks by Mask R-CNN. It optimizes all masks in a global context, improving coherence in the representation of planar boundaries. This module utilizes a technique akin to non-local networks to aggregate information across multiple masks.
Warping-Loss Module: This component reinforces the reconstruction accuracy by enforcing consistency with nearby views during the training phase. By comparing the reconstructed 3D planes from multiple perspectives, the system ensures that plane parameters and depth maps are more precise.

Benchmark and Evaluation

The paper introduces a new benchmark for evaluating piecewise planar depthmap reconstruction using 100,000 images from ScanNet. This benchmark outperforms previous datasets by capturing more granular plane segmentations, averaging 14.7 planes per image. PlaneRCNN surpasses existing methods, including PlaneNet and traditional MRF-based approaches, on several evaluation metrics, such as plane detection accuracy and geometric reconstruction quality.

Key Contributions and Results

The main contributions of this paper are twofold:

Novel Architecture: PlaneRCNN marks a shift towards using detection networks, a method previously uncommon in depthmap reconstruction tasks. This approach allows the network to infer an arbitrary number of planar regions, improving flexibility and generalization across diverse scene types.
Superior Benchmark and Performance: The enhanced benchmark with fine-grained annotations facilitates a more rigorous evaluation of planar detection and segregation. The proposed PlaneRCNN exhibits significant improvements over previous state-of-the-art methods on this benchmark.

The results demonstrate PlaneRCNN's ability to recover small planar surfaces and adapt to varied scene contexts. This adaptability is particularly evident when evaluating cross-domain generalization, where PlaneRCNN consistently outperforms its peers without domain-specific fine-tuning.

Implications and Future Directions

PlaneRCNN yields immediate implications for applications in Robotics, AR, and VR, where real-time and accurate 3D scene understanding is critical. Beyond planar detection, the method's architecture suggests potential extensions into layered depthmap modeling, possibly leading to enhanced scene completion or artifact-free view synthesis.

Future research could explore refining the PlaneRCNN model to incorporate temporal dynamics, allowing the model to process image sequences and thus learning the correspondences between plane detections over time. This could further improve robustness and accuracy in real-world scenarios where multiple views of a scene are available.

In summary, PlaneRCNN is a significant advancement in the field of 3D reconstruction from monocular images, setting a new standard in plane detection and piecewise planar depthmap reconstruction.

PDF Markdown