Deformable Shape Completion with Graph Convolutional Autoencoders (1712.00268v4)

Published 1 Dec 2017 in cs.CV

Abstract: The availability of affordable and portable depth sensors has made scanning objects and people simpler than ever. However, dealing with occlusions and missing parts is still a significant challenge. The problem of reconstructing a (possibly non-rigidly moving) 3D object from a single or multiple partial scans has received increasing attention in recent years. In this work, we propose a novel learning-based method for the completion of partial shapes. Unlike the majority of existing approaches, our method focuses on objects that can undergo non-rigid deformations. The core of our method is a variational autoencoder with graph convolutional operations that learns a latent space for complete realistic shapes. At inference, we optimize to find the representation in this latent space that best fits the generated shape to the known partial input. The completed shape exhibits a realistic appearance on the unknown part. We show promising results towards the completion of synthetic and real scans of human body and face meshes exhibiting different styles of articulation and partiality.

Citations (216)

View on Semantic Scholar

Summary

The paper introduces a graph convolutional VAE that learns a latent shape space to reconstruct partial, non-rigid 3D shapes.
It employs latent space optimization to minimize differences between observable partial inputs and the generated complete shapes.
Experiments on FAUST and Kinect scans show reduced Euclidean distance and volumetric error compared to previous approaches.

Deformable Shape Completion with Graph Convolutional Autoencoders

The paper "Deformable Shape Completion with Graph Convolutional Autoencoders" presents a novel approach to the challenge of completing partial and occluded 3D shapes, specifically focusing on non-rigidly deformable objects. The authors introduce a method that leverages a Graph Convolutional Variational Autoencoder (VAE) to learn a latent shape space, allowing for effective reconstruction from incomplete inputs. Unlike previous methods that primarily target rigid objects and volumetric convolutional neural networks (CNNs), this approach facilitates completion of shapes that undergo non-rigid deformations, utilizing the intrinsic properties of 3D meshes through graph convolutional operations.

Methodology

At the core of the proposed method is a graph convolutional VAE that learns to encode full shapes into a latent space and decode them back into realistic shapes. Training is performed on human body and face meshes which exhibit diverse styles of articulation. The network's architecture allows it to handle various forms of partiality without the need to preprocess the partial shapes during training. This design choice helps in managing arbitrary styles of occlusion that were not encountered during the training phase, thereby circumventing the bias towards specific patterns of missing data.

During inference, the decoder finds the optimal latent representation that minimizes the dissimilarity between the generated full shape and the known partial observation. This is achieved through an optimization process in the latent space that models non-rigid deformations while accommodating potential rigid transformations.

Experiments and Results

The experiments demonstrate the robustness of the proposed solution across several scenarios, including synthetic range scans and real-world range scans from Kinect-like sensors. The model's ability to produce plausible and varied completions from the same input underpins its flexibility, addressing the ambiguity inherent in shape completion tasks.

Quantitative evaluations were conducted on synthetic range scans from the FAUST dataset, comparing the proposed method against existing approaches such as 3D-EPN and Poisson reconstruction. The results underscore the superior performance of the graph convolutional autoencoder, particularly in terms of lower mean Euclidean distance and volumetric error in the completed regions, as evident from the metrics provided.

The paper also explores an innovative approach to merging completed shapes from multiple partial views by averaging their latent space representations. While this method interpolates shape and pose, potentially deviating from the exact human articulations in the inputs, it offers an intriguing avenue for dynamic fusion applications.

Implications and Future Work

The implications of this research are significant for fields requiring detailed 3D reconstructions, such as virtual and augmented reality, human-computer interaction, and robotic perception. By addressing the constraints of existing methods in handling flexible deformations, this approach lays a foundation for more sophisticated and adaptable reconstruction techniques.

Future research directions may focus on refining the method to disentangle shape and pose within the latent space, enhancing control over the outputs through explicit modeling of non-rigid transformations. Additionally, while the model shows resilience to correspondence errors, further refinement in the initialization process for real-world noisy data could amplify its applicability.

In conclusion, the paper contributes a methodologically innovative and computationally efficient approach to 3D shape completion, advancing the capabilities in reconstructing complex, non-rigid structures from incomplete data. The integration of graph convolutional operations in deep learning frameworks highlights a progressive path forward for geometric deep learning applications.

PDF Markdown