PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image (1804.06278v1)

Published 17 Apr 2018 in cs.CV

Abstract: This paper proposes a deep neural network (DNN) for piece-wise planar depthmap reconstruction from a single RGB image. While DNNs have brought remarkable progress to single-image depth prediction, piece-wise planar depthmap reconstruction requires a structured geometry representation, and has been a difficult task to master even for DNNs. The proposed end-to-end DNN learns to directly infer a set of plane parameters and corresponding plane segmentation masks from a single RGB image. We have generated more than 50,000 piece-wise planar depthmaps for training and testing from ScanNet, a large-scale RGBD video database. Our qualitative and quantitative evaluations demonstrate that the proposed approach outperforms baseline methods in terms of both plane segmentation and depth estimation accuracy. To the best of our knowledge, this paper presents the first end-to-end neural architecture for piece-wise planar reconstruction from a single RGB image. Code and data are available at https://github.com/art-programmer/PlaneNet.

Citations (190)

View on Semantic Scholar

Summary

The paper presents PlaneNet, a novel deep neural model for reconstructing piece-wise planar depthmaps from a single RGB image.
It employs Dilated Residual Networks and a specialized loss function to infer planar parameters and probabilistic segmentation masks efficiently.
Quantitative and qualitative results demonstrate significant improvements in planar segmentation and depth estimation, with promising applications in AR and robotics.

Overview of PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image

The paper "PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image" introduces a novel deep neural network architecture designed for the piece-wise planar reconstruction of depth maps from a single RGB image. While deep neural networks have advanced single-image depth prediction significantly, the task of piece-wise planar reconstruction, which requires a structured geometry representation, presents unique challenges that have yet to be fully addressed. This research delineates a comprehensive solution to these challenges through the PlaneNet architecture, which stands as the first end-to-end neural model for this specific type of reconstruction.

Methodology

The PlaneNet architecture capitalizes on a structured approach to depthmap prediction by inferring a set of planar parameters and corresponding probabilistic plane segmentation masks. Drawing on more than 50,000 piece-wise planar depthmaps generated from the ScanNet database, the paper employs a robust training process to build the model. A key feature of PlaneNet is the loss function, inspired by point-set generation methodologies, which is designed to be agnostic to the order of plane predictions. This allows the network to handle the inherent challenge of not knowing the number or order of planes to be regressed a priori.

The architecture leverages a variety of technical components, including Dilated Residual Networks (DRNs), to manage the pixel-wise prediction tasks necessary for depth reconstruction. Through its branches, PlaneNet predicts plane parameters, probabilistic segmentation masks, and depthmaps for non-planar surfaces, ensuring comprehensive 3D scene parsing from a single image input.

Quantitative and Qualitative Results

Empirical evaluations demonstrate that PlaneNet surpasses existing methods on both plane segmentation and depth estimation metrics. It showed a significant improvement over traditional methods that reconstruct piece-wise planar depthmaps from RGB-D data, particularly when applied to inferred depthmaps. These advances are underscored by quantitative evaluations against competing baselines, which include NYU-Toolbox, Manhattan World Stereo, and Piecewise Planar Stereo. PlaneNet not only excels in accuracy over inferred depthmaps but also competes robustly in conditions where ground-truth depthmaps are used by other models.

Furthermore, the research highlights the superior depth prediction accuracy of PlaneNet in planar regions, at boundaries, and across entire images compared to state-of-the-art single-image depth inference techniques.

Applications and Future Work

The implications of this work are manifold, with immediate applications in fields such as augmented reality (AR) and robotics. For instance, PlaneNet's ability to identify and segment dominant planes in a scene can facilitate AR applications like virtual object placement or texture editing. These applications underscore the practical significance of structured geometry inference in enhancing human-computer interaction interfaces with real-world environments.

Looking forward, one promising direction is extending the approach beyond depthmaps to address structured geometry prediction in a full 3D context. Such advancements could revolutionize applications requiring detailed environmental modeling and could foster new paradigms in interactable virtual spaces.

Overall, this paper represents a significant step in deep learning's capacity to mimic human visual perception by inferring structured geometric cue representations from highly unstructured input data like single-view RGB images. The potential expansions of PlaneNet's principles offer exciting opportunities for further research and development in computer vision and artificial intelligence.

PDF Markdown