360$^\circ$ Reconstruction From a Single Image Using Space Carved Outpainting (2309.10279v1)
Abstract: We introduce POP3D, a novel framework that creates a full $360\circ$-view 3D model from a single image. POP3D resolves two prominent issues that limit the single-view reconstruction. Firstly, POP3D offers substantial generalizability to arbitrary categories, a trait that previous methods struggle to achieve. Secondly, POP3D further improves reconstruction fidelity and naturalness, a crucial aspect that concurrent works fall short of. Our approach marries the strengths of four primary components: (1) a monocular depth and normal predictor that serves to predict crucial geometric cues, (2) a space carving method capable of demarcating the potentially unseen portions of the target object, (3) a generative model pre-trained on a large-scale image dataset that can complete unseen regions of the target, and (4) a neural implicit surface reconstruction method tailored in reconstructing objects using RGB images along with monocular geometric cues. The combination of these components enables POP3D to readily generalize across various in-the-wild images and generate state-of-the-art reconstructions, outperforming similar works by a significant margin. Project page: \url{http://cg.postech.ac.kr/research/POP3D}
- Matan Atzmon and Yaron Lipman. 2020. SAL: Sign Agnostic Learning of Shapes From Raw Data. In Proc. of CVPR.
- Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. In Proc. of ICCV.
- Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. In Proc. of CVPR.
- ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth. arXiv:2302.12288 [cs.CV]
- Generative Novel View Synthesis with 3D-Aware Diffusion Models. arXiv:2304.02602 [cs.CV]
- 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction. In Proc. of ECCV.
- Objaverse: A Universe of Annotated 3D Objects. arXiv:2212.08051 [cs.CV]
- Congyue Deng, Chiyu “Max” Jiang, Charles R. Qi, Xinchen Yan, Yin Zhou, Leonidas Guibas, and Dragomir Anguelov. 2023. NeRDi: Single-View NeRF Synthesis With Language-Guided Diffusion As General Image Priors. In Proc. of CVPR. 20637–20647.
- Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans. In Proc. of ICCV. 10786–10796.
- P. Favaro and S. Soatto. 2005. A geometric approach to shape from defocus. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 27, 3 (2005), 406–417.
- An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. In Proc. of ICLR.
- Learning a Predictable and Generative Vector Representation for Objects. In Proc. of ECCV.
- AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation. In Proc. of CVPR.
- NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion. In Proc. of ICML.
- Fast and Explicit Neural View Synthesis. In Proc. of WACV. 3791–3800.
- Single-View View Synthesis in the Wild with Learned Adaptive Multiplane Images. In Proc. of ACM SIGGRAPH.
- Escaping Plato’s Cave: 3D Shape From Adversarial Rendering. In Proc. of ICCV.
- LoRA: Low-Rank Adaptation of Large Language Models. In Proc. of ICLR.
- Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models. arXiv:2303.11989 [cs.CV]
- Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis. In Proc. of ICCV. 5885–5894.
- Wonbong Jang and Lourdes Agapito. 2021. CodeNeRF: Disentangled Neural Radiance Fields for Object Categories. In Proc. of ICCV. 12949–12958.
- Learning Category-Specific Mesh Reconstruction from Image Collections. In Proc. of ECCV.
- HOLODIFFUSION: Training a 3D Diffusion Model Using 2D Images. In Proc. of CVPR. 18423–18433.
- K.N. Kutulakos and S.M. Seitz. 1999. A theory of shape by space carving. In Proc. of ICCV. 307–314 vol.1.
- A. Laurentini. 1994. The Visual Hull Concept for Silhouette-Based Image Understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 16, 2 (1994), 150–162.
- TRACER: Extreme Attention Guided Salient Object Tracing Network. In Proc. of AAAI Conference on Artificial Intelligence, Vol. 36. 12993–12994.
- Vision Transformer for NeRF-Based View Synthesis From a Single Input Image. In Proc. of WACV. 806–815.
- Zero-1-to-3: Zero-shot One Image to 3D Object. arXiv:2303.11328 [cs.CV]
- Angeline Loh. 2006. The recovery of 3-D structure using visual texture patterns. Ph. D. Dissertation.
- RealFusion: 360deg Reconstruction of Any Object From a Single Image. In Proc. of CVPR. 8446–8455.
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In Proc. of ECCV.
- Making a “Completely Blind” Image Quality Analyzer. IEEE Signal Processing Letters 20, 3 (2013), 209–212.
- Point-E: A System for Generating 3D Point Clouds from Complex Prompts. arXiv:2212.08751 [cs.CV]
- Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision. In Proc. of CVPR.
- Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion. In Proc. of CVPR.
- DreamFusion: Text-to-3D using 2D Diffusion. In Proc. of ICLR.
- Learning Transferable Visual Models From Natural Language Supervision. In Proc. of ICML, Vol. 139. 8748–8763.
- DreamBooth3D: Subject-Driven Text-to-3D Generation. arXiv:2303.13508 [cs.CV]
- Dense Depth Priors for Neural Radiance Fields from Sparse Input Views. In Proc. of CVPR.
- High-Resolution Image Synthesis With Latent Diffusion Models. In Proc. of CVPR. 10684–10695.
- Radu Alexandru Rosu and Sven Behnke. 2023. PermutoSDF: Fast Multi-View Reconstruction With Implicit Surfaces Using Permutohedral Lattices. In Proc. of CVPR. 8466–8475.
- DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. In Proc. of CVPR. 22500–22510.
- PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization. In Proc. of ICCV.
- Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations. In Proc. of CVPR. 6229–6238.
- LAION-5B: An open large-scale dataset for training next generation image-text models. In Proc. of NeurIPS.
- 3D Photography using Context-aware Layered Depth Inpainting. In Proc. of CVPR.
- 3D Neural Field Generation Using Triplane Diffusion. In Proc. of CVPR. 20875–20886.
- Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations. In Proc. of NeurIPS.
- Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior. arXiv:2303.14184 [cs.CV]
- Pre-train, Self-train, Distill: A simple recipe for Supersizing 3D Reconstruction. In Proc. of CVPR.
- Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields. In Proc. of CVPR.
- Exploiting Diffusion Prior for Real-World Image Super-Resolution. arXiv:2305.07015 [cs.CV]
- Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images. In Proc. of ECCV.
- RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion. In Proc. of CVPR. 4563–4573.
- NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction. arXiv:2212.05231 [cs.CV]
- Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.
- Novel View Synthesis with Diffusion Models. In Proc. of ICLR.
- Multiview Compressive Coding for 3D Reconstruction. In Proc. of CVPR. 9065–9075.
- MagicPony: Learning Articulated 3D Animals in the Wild. Proc. of CVPR.
- Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images. In Proc. of ICCV.
- NeuralLift-360: Lifting an In-the-Wild 2D Photo to a 3D Object With 360deg Views. In Proc. of CVPR. 4479–4489.
- Volume rendering of neural implicit surfaces. In Proc. of NeurIPS.
- Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance. In Proc. of NeurIPS.
- Shelf-Supervised Mesh Prediction in the Wild. In Proc. of CVPR.
- pixelNeRF: Neural Radiance Fields from One or Few Images. In Proc. of CVPR.
- MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction. In Proc. of NeurIPS.
- Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields. arXiv:2305.11588 [cs.CV]
- NeRF++: Analyzing and Improving Neural Radiance Fields.
- The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proc. of CVPR.
- Shape-from-shading: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 21, 8 (1999), 690–706.
- Zhizhuo Zhou and Shubham Tulsiani. 2023. SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction. In Proc. of CVPR. 12588–12597.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.