3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks (1707.06375v3)

Published 20 Jul 2017 in cs.CV and cs.GR

Abstract: We propose a method for reconstructing 3D shapes from 2D sketches in the form of line drawings. Our method takes as input a single sketch, or multiple sketches, and outputs a dense point cloud representing a 3D reconstruction of the input sketch(es). The point cloud is then converted into a polygon mesh. At the heart of our method lies a deep, encoder-decoder network. The encoder converts the sketch into a compact representation encoding shape information. The decoder converts this representation into depth and normal maps capturing the underlying surface from several output viewpoints. The multi-view maps are then consolidated into a 3D point cloud by solving an optimization problem that fuses depth and normals across all viewpoints. Based on our experiments, compared to other methods, such as volumetric networks, our architecture offers several advantages, including more faithful reconstruction, higher output surface resolution, better preservation of topology and shape structure.

Authors (5)

Zhaoliang Lun (1 paper)
Matheus Gadelha (28 papers)
Evangelos Kalogerakis (44 papers)
Subhransu Maji (78 papers)
Rui Wang (997 papers)

Citations (178)

View on Semantic Scholar

Summary

The paper proposes a deep encoder-decoder network leveraging multi-view convolutional techniques to reconstruct 3D shapes from 2D sketches.
The method achieves superior performance against baselines in reconstructing accurate structural details and handling both synthetic and human-drawn inputs.
This technique offers potential applications in design and animation for rapid prototyping from sketches and could influence future spatial network architectures.

3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks

The paper "3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks" proposes a novel method for reconstructing three-dimensional shapes from two-dimensional sketches. This paper addresses a critical challenge in computer graphics and vision, where converting a sketch to a 3D model has been historically cumbersome and inefficient. The authors introduce a deep encoder-decoder network architecture that effectively bridges the gap between 2D line drawings and 3D shape representations, leveraging multi-view convolutional techniques.

Methodology

The approach centers on utilizing a convolutional neural network (ConvNet) structured within an encoder-decoder framework. This model takes as input either single-view or multi-view sketches and outputs depth maps and normal maps across multiple predetermined viewpoints. The heart of the method lies in a deep learning model trained to map sketch-based inputs to 3D geometrical data. The process begins with an encoder transforming sketches into compact feature representations. A multi-view decoder then transforms these features into depth and normal maps, which subsequently feed into a point cloud generation process. The point cloud is optimized and formulated into a polygon mesh, capturing the intricacies of the 3D shape implied by the sketches.

Results and Comparisons

The paper extensively evaluates its proposed architecture against several baselines, including voxel-based methods and alternative multi-view synthesis approaches. The results demonstrate superior performance in terms of fidelity to geometric structure, surface resolution, and topological preservation. Key metrics such as Hausdorff distance, Chamfer distance, and volumetric Jaccard distance favor the multi-view approach due to its capability to model viewpoints with finer granularity than voxel grids allow.

Regarding reconstruction quality, the proposed method generally produces more accurate structural details. It effectively handles both synthetic sketches and human-drawn line drawings, showcasing the model's versatility and robustness against input noise and inconsistencies typical in hand-drawn artwork.

Implications and Future Directions

The presented technique potentially revolutionizes the sketch-to-model conversion process, widely applicable in design, animation, and educational contexts where rapid prototyping from sketch concepts is demanded. Furthermore, the depth and normal map consolidation into a point cloud represents a particularly innovative step that could influence future neural network designs focused on spatial coherence and feature alignment.

The authors hint at future work that could integrate the optimization process within the network, enabling a more seamless transition from sketches to 3D objects. Additionally, exploring dynamic input viewpoints could bring further realism and interactive capabilities to shape reconstructions. Integrating interactive modeling interfaces directly into the reconstruction pipeline may enhance usability for artists and designers, allowing for refined model adjustments beyond initial automated conversions.

Conclusion

This paper lays the groundwork for a substantial shift in how 3D models can be generated from 2D sketches. With the multi-view ConvNet architecture, the authors deliver a tool poised to enhance efficiency and creativity in fields reliant on 3D modeling. The method may not yet capture all production-quality details, but its role in generating structural proxies opens a pathway to more productive collaborative workflows between AI systems and human creativity in digital design.