Occupancy Networks: Learning 3D Reconstruction in Function Space

Published 10 Dec 2018 in cs.CV | (1812.03828v2)

Abstract: With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose Occupancy Networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.

Abstract PDF Upgrade to Chat

Citations (2,672)

View on Semantic Scholar

Summary

The paper introduces a continuous 3D representation using neural network decision boundaries to encode shapes at arbitrary resolutions.
It demonstrates versatile input handling by reconstructing detailed 3D models from images, noisy point clouds, and coarse voxel grids.
Experimental results on ShapeNet reveal significant improvements over traditional voxel, point cloud, and mesh-based reconstruction methods.

Overview of "Occupancy Networks: Learning 3D Reconstruction in Function Space"

The paper "Occupancy Networks: Learning 3D Reconstruction in Function Space" introduces an innovative approach to 3D reconstruction, addressing the limitations inherent in the existing 3D representations such as voxel grids, point clouds, and meshes. This research presents a new representation paradigm called Occupancy Networks (ONet) which leverages the continuous decision boundary of a deep neural network classifier to efficiently encode and reconstruct 3D geometry at arbitrary resolutions without incurring a substantial memory overhead.

Key Contributions

Continuous 3D Representation: The paper proposes a shift from discrete representations to a continuous one, where the 3D surface is implicitly defined by the decision boundary of a neural network. This allows for an encoding of 3D structures at infinite resolution, improving detail preservation and reducing memory usage.
Versatile Input Handling: Occupancy Networks can handle various input forms, including single images, noisy point clouds, and low-resolution voxel grids. This is achieved through the conditioning of the neural network on the input, enabling it to process and reconstruct 3D shapes from diverse types of data.
Experimental Validation: The proposed method was rigorously evaluated against several state-of-the-art baselines using the ShapeNet dataset across different tasks such as single image 3D reconstruction, point cloud completion, and voxel super-resolution. The paper reports that Occupancy Networks consistently produce high-quality meshes, outperforming alternatives in terms of IoU, Chamfer distance, and normal consistency.

Methodology

The core idea behind Occupancy Networks is to model the occupancy function $o : \mathbb{R}^3 \to \{0, 1\}$ with a neural network $f_\theta$ . For a given point $p \in \mathbb{R}^3$ and an observation $x$ , $f_\theta(p, x)$ predicts the probability that $p$ is inside the object. This approach unifies the handling of different input observation types through a conditioning mechanism.

Training and Inference

Training: The network is trained using a cross-entropy loss on sampled points within the bounding volume of the object. The sampling strategy plays a crucial role in the model's performance, and uniform sampling across the bounding volume was found to be the most effective.
Inference: For mesh extraction, a hierarchical approach called Multiresolution IsoSurface Extraction (MISE) is used. This method adaptively refines the 3D grid and extracts the isosurface via the Marching Cubes algorithm. Additionally, the paper introduces a gradient-based refinement process to enhance mesh quality post-extraction.

Experimental Results

Quantitative and qualitative analyses show that Occupancy Networks deliver superior performance across several metrics:

Single Image Reconstruction: The method exceeded traditional voxel-based (3D-R2N2), point-based (PSGN), and mesh-based (Pixel2Mesh, AtlasNet) models in terms of IoU, Chamfer distance, and normal consistency. This suggests a better preservation of surface detail and topology.
Point Cloud Completion: The model demonstrated its robustness by reconstructing high-quality surfaces from sparse and noisy point clouds, showing significant improvements over competing methods like Deep Marching Cubes (DMC).
Voxel Super-Resolution: ONet was effective in enhancing voxelized inputs, proving its capability to convert coarse voxel grids into highly-detailed meshes.

Implications and Future Work

Occupancy Networks, by virtue of their continuous representation and adaptability to various input types, offer a versatile tool for numerous 3D reconstruction tasks. The method's ability to generate high-resolution, watertight meshes with complex topologies without excessive memory consumption could significantly advance fields such as computer graphics, virtual reality, and autonomous systems.

Future developments could focus on improving the efficiency of MISE, exploring extensions to other forms of input such as multi-view images, and integrating occupancy networks into larger pipelines for real-time applications. The exploration of different network architectures and sampling strategies could also yield insights to further enhance performance and applicability.

Acknowledgment: The authors acknowledge support from the Intel Network on Intelligent Systems and the Microsoft Research PhD Scholarship Programme.

In conclusion, the paper charts a promising course for advancing 3D reconstruction techniques through the introduction of Occupancy Networks, a method that harmonizes computational efficiency with high fidelity in 3D geometry representation.

Markdown