VConv-DAE: Deep Volumetric Shape Learning Without Object Labels (1604.03755v3)

Published 13 Apr 2016 in cs.CV and cs.GR

Abstract: With the advent of affordable depth sensors, 3D capture becomes more and more ubiquitous and already has made its way into commercial products. Yet, capturing the geometry or complete shapes of everyday objects using scanning devices (e.g. Kinect) still comes with several challenges that result in noise or even incomplete shapes. Recent success in deep learning has shown how to learn complex shape distributions in a data-driven way from large scale 3D CAD Model collections and to utilize them for 3D processing on volumetric representations and thereby circumventing problems of topology and tessellation. Prior work has shown encouraging results on problems ranging from shape completion to recognition. We provide an analysis of such approaches and discover that training as well as the resulting representation are strongly and unnecessarily tied to the notion of object labels. Thus, we propose a full convolutional volumetric auto encoder that learns volumetric representation from noisy data by estimating the voxel occupancy grids. The proposed method outperforms prior work on challenging tasks like denoising and shape completion. We also show that the obtained deep embedding gives competitive performance when used for classification and promising results for shape interpolation.

Citations (273)

View on Semantic Scholar

Summary

The paper introduces a fully convolutional volumetric autoencoder that learns 3D occupancy grids from noisy data without relying on object labels.
It demonstrates exceptional performance with 1.04% error in denoising at 30% noise and as low as 1.31% error in shape completion tasks.
The model achieves significant runtime improvements with inference times of only 3ms compared to 600ms in prior supervised methods.

Overview of VConv-DAE: Deep Volumetric Shape Learning Without Object Labels

The paper "VConv-DAE: Deep Volumetric Shape Learning Without Object Labels," authored by Abhishek Sharma, Oliver Grau, and Mario Fritz, addresses a pivotal challenge in the field of 3D geometry acquisition and processing. This challenge is particularly relevant given the increasing ubiquity of affordable depth sensors, which facilitate the widespread capture of 3D data. Despite their potential, existing scanning devices like the Kinect often produce noisy or incomplete shapes due to sensor noise and viewpoint occlusions.

Contributions and Methodology

The core contribution of this research is the introduction of a fully convolutional volumetric autoencoder, VConv-DAE, aimed at learning volumetric representations directly from noisy data without reliance on object labels. This differentiates it from previous methodologies which are often tethered to the use of labeled data, thus incurring additional costs and complexity.

Architecture: VConv-DAE employs a convolutional structure to learn a deep embedding of object shapes. It leverages voxel occupancy grids to achieve this, where the network is trained in unsupervised fashion, aiming to complete and denoise 3D shapes by predicting the occupancy of voxels.
Performance: The proposed approach shows superior performance over preceding methods, especially in tasks like denoising and shape completion. Its competitive classification performance and promising interpolation results underscore the effectiveness of this unsupervised learning framework.

Numerical Results and Evaluation

This paper rigorously evaluates the VConv-DAE against benchmarks, highlighting the following results:

The VConv-DAE demonstrates superior performance in denoising scenarios, exhibiting an average error notably lower (1.04% for 30% noise) than baseline supervised approaches such as ShapeNet (with errors exceeding 12% in similar conditions).
In shape completion tasks, VConv-DAE's error rates (as low as 1.31% for 10% slicing noise) further validate its efficacy.
Runtime efficiency is another salient aspect, where VConv-DAE achieves runtime improvements of two orders of magnitude over ShapeNet, taking only 3ms for inference versus 600ms.

Theoretical and Practical Implications

From a theoretical standpoint, the development and success of VConv-DAE signify an important stride towards unsupervised feature learning in 3D domains, marking a departure from traditionally label-dependent processes. This approach not only reduces the cost associated with acquiring labeled data but also opens avenues for more scalable volumetric data processing systems.

Practically, this research could enhance multiple applications where 3D data is pivotal, including augmented reality, robotic interaction, and 3D printing. The ability to accurately reconstruct and denoise 3D data without labels lowers barriers for real-world deployment, enhancing these applications' effectiveness.

Future Directions

Future research may explore extending VConv-DAE to cope with deformable objects and larger, more complex scenes. Additionally, integrating this approach with dynamic and real-time applications could further unlock its transformative potential in practical, industry-grade scenarios. Continued exploration into reducing model complexity and improving runtime performance could also see improvements in mobile and battery-powered applications.

PDF Markdown