Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction (2003.10983v3)

Published 24 Mar 2020 in cs.CV, cs.CG, and cs.LG

Abstract: Efficiently reconstructing complex and intricate surfaces at scale is a long-standing goal in machine perception. To address this problem we introduce Deep Local Shapes (DeepLS), a deep shape representation that enables encoding and reconstruction of high-quality 3D shapes without prohibitive memory requirements. DeepLS replaces the dense volumetric signed distance function (SDF) representation used in traditional surface reconstruction systems with a set of locally learned continuous SDFs defined by a neural network, inspired by recent work such as DeepSDF. Unlike DeepSDF, which represents an object-level SDF with a neural network and a single latent code, we store a grid of independent latent codes, each responsible for storing information about surfaces in a small local neighborhood. This decomposition of scenes into local shapes simplifies the prior distribution that the network must learn, and also enables efficient inference. We demonstrate the effectiveness and generalization power of DeepLS by showing object shape encoding and reconstructions of full scenes, where DeepLS delivers high compression, accuracy, and local shape completion.

Authors (7)

Rohan Chabra (3 papers)
Jan Eric Lenssen (31 papers)
Eddy Ilg (33 papers)
Tanner Schmidt (9 papers)
Julian Straub (23 papers)
Steven Lovegrove (8 papers)
Richard Newcombe (40 papers)

Citations (441)

View on Semantic Scholar

Summary

The paper introduces local SDF priors that decompose 3D shapes into spatial partitions, enhancing detail and reducing memory demands.
It employs a compact four-layer neural network to accelerate training and reconstruction while robustly modeling thin structures and incomplete data.
Evaluations on benchmark datasets show the method outperforms object-level approaches in reconstruction fidelity and computational efficiency.

Overview of "Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction"

The paper "Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction" proposes a novel deep learning approach to efficiently reconstruct complex 3D surfaces at scale, referred to as Deep Local Shapes (DeepLS). This method aims to enhance the representation and reconstruction of high-quality 3D shapes using a deep shape model that substantially reduces memory demands. Through local decomposition, DeepLS provides a compact representation while maintaining fine-grained detail, addressing limitations in both traditional and contemporary methods of 3D reconstruction.

DeepLS builds upon the concept of Signed Distance Functions (SDFs), which represent 3D surfaces as zero-level sets of a scalar field. Classical methods often use dense SDFs on voxel grids which, despite their success, face challenges such as high memory demands and limited ability to capture fine details or extrapolate incomplete data. DeepLS introduces a distributed representation using local, continuous SDFs defined by neural networks. This strategy leverages a partitioning approach wherein SDF descriptions are assigned to spatially discrete miniature regions, encoded by latent vectors. Unlike object-level methods such as DeepSDF, which employ a single latent code per object, DeepLS utilizes a grid of local latent codes, enabling it to handle extensive scene-level reconstruction seamlessly.

Key Contributions

Local Shape Priors: DeepLS encodes surfaces with locally defined latent codes that represent continuous SDFs within a smaller region of space. This partitioning simplifies the prior distribution, leading to enhanced generalization across diverse scene structures while reducing memory footprints.
Efficient Encoding: The decentralized block-based structure enables efficient inference: the network architecture, a truncated implementation of the DeepSDF model, comprises a more compact, four-layer fully connected network, utilizing just a fraction of the parameters compared to object-centric models, significantly accelerating both training and reconstruction times.
Robust Performance: DeepLS demonstrates impressive reconstruction capabilities across various datasets. It notably excels at reconstructing thin structures and partially observed scenes. Evaluation on benchmark datasets and comparative analysis exhibit it achieving superior performance with respect to completion and accuracy versus other methods.

Numerical Results and Experimental Findings

The paper reports a marked enhancement in reconstruction fidelity, with DeepLS improving upon the Chamfer distance metric by an order of magnitude compared to methods like DeepSDF and AtlasNet across standard 3D shape benchmarks. It effectively balances memory efficiency, resolution, and surface detail; crucially, its reconstructions exhibit completeness even at higher spatial compression settings, a metric where traditional dense voxel methods falter. DeepLS also implements a novel scheme to maintain consistency across local SDF boundaries without incurring excessive computation by training for border consistency.

In real-world and synthetic scene validations, such as those conducted on ICL-NUIM and 3D Scene datasets, DeepLS extracts detailed geometry from depth observations, surpassing volumetric fusion approaches in both error rate and reconstruction extensiveness, thereby supporting its applicability beyond object-centric frameworks.

Implications and Future Directions

The research indicates a significant leap in addressing scalability and fidelity in scene reconstruction using deep geometry learning. Local representation models, such as DeepLS, that synthesize spatially localized priors could lead to further advances in broader applications like autonomous navigation, augmented reality, and robotics, where environmental understanding based on partial sensor data is paramount.

Future exploration may involve enhancing these priors with adaptive partitioning schemes or integrating varied sensory inputs, thereby comprehensively modeling dynamic and densely populated environments. Additionally, scaling this framework efficiently to real-time applications with continuous observation streams remains an open challenge. This approach offers a promising direction for embedding sophisticated shape understanding while balancing resource constraints in computationally intensive settings.

PDF Markdown