Slice3D: Multi-Slice, Occlusion-Revealing, Single View 3D Reconstruction (2312.02221v2)

Published 3 Dec 2023 in cs.CV and cs.GR

Abstract: We introduce multi-slice reasoning, a new notion for single-view 3D reconstruction which challenges the current and prevailing belief that multi-view synthesis is the most natural conduit between single-view and 3D. Our key observation is that object slicing is more advantageous than altering views to reveal occluded structures. Specifically, slicing is more occlusion-revealing since it can peel through any occluders without obstruction. In the limit, i.e., with infinitely many slices, it is guaranteed to unveil all hidden object parts. We realize our idea by developing Slice3D, a novel method for single-view 3D reconstruction which first predicts multi-slice images from a single RGB image and then integrates the slices into a 3D model using a coordinate-based transformer network for signed distance prediction. The slice images can be regressed or generated, both through a U-Net based network. For the former, we inject a learnable slice indicator code to designate each decoded image into a spatial slice location, while the slice generator is a denoising diffusion model operating on the entirety of slice images stacked on the input channels. We conduct extensive evaluation against state-of-the-art alternatives to demonstrate superiority of our method, especially in recovering complex and severely occluded shape structures, amid ambiguities. All Slice3D results were produced by networks trained on a single Nvidia A40 GPU, with an inference time less than 20 seconds.

Citations (7)

View on Semantic Scholar

Summary

The paper introduces a novel Slice3D approach that predicts multi-slice cross-sections to reconstruct 3D objects, revealing occluded details from a single image.
It leverages both regression-based and diffusion model techniques to generate slice images without needing computationally heavy multi-view consistency.
Evaluations on datasets like ShapeNet and Objverse demonstrate that Slice3D outperforms state-of-the-art methods with faster inference and improved occlusion recovery.

Unveiling the Hidden Dimensions: Innovative 3D Reconstruction from a Single Image

3D reconstruction from a single image is a complex challenge in computer vision. Traditional methods rely heavily on multiple images to reconstruct occluded or hidden parts of an object. However, an innovative approach, introduced as "Slice3D," heralds a shift from these traditional multi-view synthesis techniques.

Slice3D essentially dissects the visual information into what can be likened to thin cross-sectional layers or "slices" that, when assembled, reveal the object's complete 3D structure, including its occluded parts. This approach holds a striking advantage: it theoretically promises to expose every hidden segment of an object if provided with an infinite number of slices. Practically, even with a limited number of slices, Slice3D outperforms multi-view methods in revealing more of the object's structure, without the need to rotate the camera or maintain multi-view consistency.

The process involves two main stages: first, the multi-slice images are predicted from the single input image using two separate techniques—a regression-based network or a diffusion model generator. The regression-based network appends a learnable slice indicator code to direct each image slice to the correct spatial location, creating a deterministic output. In contrast, the diffusion model allows for the generation of multiple plausible models by considering various unseen structures, accepting the inherent ambiguity in the task. Second, these predicted slices are integrated into a 3D model with the help of a coordinate-based transformer network that produces a signed distance function, a continuous field defining the object's surface.

Unlike methods that strive to resolve the multi-view consistency challenge with spatial attention mechanisms—which can be computationally burdensome—Slice3D's technique requires no such measures. The absence of camera view alteration during the slice prediction and 3D reconstruction stages also facilitates the use of convolutional features, adding to the method's efficiency.

Conducted evaluations have compared Slice3D to various state-of-the-art alternatives like DISN and One-2-3-45, demonstrating its superior ability to recover complex and severely occluded shape structures. The evaluation included experiments on widely recognized datasets such as ShapeNet and Objverse, where Slice3D exhibited not only higher reconstruction quality but also better generalizability and handling of uncertain elements. All outputs were created using a single Nvidia A40 GPU, with inference times under 20 seconds—significantly quicker than several other methods, including those based on neural radiance fields (NeRFs).

Looking forward, Slice3D presents a conceptual leap for single-view 3D reconstruction. It is not limited by specific model categories and offers a more natural methodology for uncovering occluded parts without the hefty computational cost of spatial attention across views. While the current implementation is leveraged primarily on digital 3D models and may still face challenges with detailed object parts or those at a distance from the camera, future exploration into more refined slicing and potential applications to physical objects beckon.

PDF Markdown

Related Papers

Tweets

https://twitter.com/taziku_co/status/1804105158005264572