DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction (1905.10711v5)

Published 26 May 2019 in cs.CV

Abstract: Reconstructing 3D shapes from single-view images has been a long-standing research problem. In this paper, we present DISN, a Deep Implicit Surface Network which can generate a high-quality detail-rich 3D mesh from an 2D image by predicting the underlying signed distance fields. In addition to utilizing global image features, DISN predicts the projected location for each 3D point on the 2D image, and extracts local features from the image feature maps. Combining global and local features significantly improves the accuracy of the signed distance field prediction, especially for the detail-rich areas. To the best of our knowledge, DISN is the first method that constantly captures details such as holes and thin structures present in 3D shapes from single-view images. DISN achieves the state-of-the-art single-view reconstruction performance on a variety of shape categories reconstructed from both synthetic and real images. Code is available at https://github.com/xharlie/DISN The supplementary can be found at https://xharlie.github.io/images/neurips_2019_supp.pdf

Citations (529)

View on Semantic Scholar

Summary

The paper introduces DISN, which leverages deep learning to predict signed distance fields from single-view images for high-quality 3D reconstruction.
It integrates local and global feature extraction to accurately capture fine details such as holes and thin structures.
Quantitative metrics like Chamfer Distance and IoU verify its superior performance compared to state-of-the-art methods.

Overview of DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction

The paper "DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction" introduces a novel approach for 3D shape reconstruction from single-view images using a method called Deep Implicit Surface Network (DISN). This approach leverages deep learning to predict signed distance fields (SDFs) and subsequently generate high-quality 3D meshes rich in detail, addressing a significant challenge in computer vision and graphics.

Key Contributions

DISN utilizes both global and local feature extraction to improve the accuracy of SDF predictions. By integrating local image patches into the feature extraction process, the network effectively captures fine-grained details that previous state-of-the-art methods often miss, such as holes and thin structures. This enhances the reconstruction quality significantly.

Local Feature Extraction: DISN estimates camera parameters to project each 3D query point onto the image plane, extracting a local feature patch. The combination of these local features with global image features helps improve the network's ability to predict SDFs accurately, particularly in detail-rich areas.
Implicit Surface Representation: The network uses SDFs as an implicit surface representation. This approach provides flexibility, allowing for the generation of complex, topology-variant structures, circumventing limitations imposed by explicit surface representations with fixed topology.
Architecture and Training: DISN's architecture includes separate decoders for local and global features, enhancing its capacity to recover high-fidelity details from input images. The training is guided by a weighted loss function emphasizing points near the surface.
Quantitative and Qualitative Performance: DISN has demonstrated superior performance across metrics such as Chamfer Distance (CD), Earth Mover's Distance (EMD), and Intersection over Union (IoU) when compared to other state-of-the-art methods, including 3D CNNs and models based on explicit surface generation.

Implications and Future Directions

This research substantially advances the field of 3D reconstruction from images, offering a promising direction for applications in augmented reality, virtual reality, and even industrial design, where detailed and accurate 3D models are essential.

Practical Implications: DISN’s ability to consistently produce high-detail reconstructions from a single image holds potential for industries where rapid prototyping and design modifications are crucial.
Theoretical Contributions: On a theoretical level, the incorporation of both local and global features for SDF prediction offers insights into feature fusion strategies in deep networks.
Future Work: Future developments may focus on enhancing the domain adaptation capabilities of DISN, allowing it to handle more complex scenes and varied backgrounds. Moreover, extending the approach to include texture and material prediction could be explored, possibly integrating differentiable rendering techniques for more comprehensive 3D scene understanding.

The paper represents a significant step towards more accurate and detail-preserving single-view 3D reconstruction, paving the way for further innovations in its application and theory. The results demonstrate that incorporating implicit representations with advanced feature extraction methods can overcome many of the limitations seen in previous models.

PDF Markdown

Related Papers

GitHub

GitHub - Xharlie/DISN: (latest updates and bug fixed) DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction (184 stars)