Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 71 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 467 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

360$^\circ$ Reconstruction From a Single Image Using Space Carved Outpainting (2309.10279v1)

Published 19 Sep 2023 in cs.CV and cs.GR

Abstract: We introduce POP3D, a novel framework that creates a full $360\circ$-view 3D model from a single image. POP3D resolves two prominent issues that limit the single-view reconstruction. Firstly, POP3D offers substantial generalizability to arbitrary categories, a trait that previous methods struggle to achieve. Secondly, POP3D further improves reconstruction fidelity and naturalness, a crucial aspect that concurrent works fall short of. Our approach marries the strengths of four primary components: (1) a monocular depth and normal predictor that serves to predict crucial geometric cues, (2) a space carving method capable of demarcating the potentially unseen portions of the target object, (3) a generative model pre-trained on a large-scale image dataset that can complete unseen regions of the target, and (4) a neural implicit surface reconstruction method tailored in reconstructing objects using RGB images along with monocular geometric cues. The combination of these components enables POP3D to readily generalize across various in-the-wild images and generate state-of-the-art reconstructions, outperforming similar works by a significant margin. Project page: \url{http://cg.postech.ac.kr/research/POP3D}

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Matan Atzmon and Yaron Lipman. 2020. SAL: Sign Agnostic Learning of Shapes From Raw Data. In Proc. of CVPR.
  2. Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. In Proc. of ICCV.
  3. Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. In Proc. of CVPR.
  4. ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth. arXiv:2302.12288 [cs.CV]
  5. Generative Novel View Synthesis with 3D-Aware Diffusion Models. arXiv:2304.02602 [cs.CV]
  6. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction. In Proc. of ECCV.
  7. Objaverse: A Universe of Annotated 3D Objects. arXiv:2212.08051 [cs.CV]
  8. Congyue Deng, Chiyu “Max” Jiang, Charles R. Qi, Xinchen Yan, Yin Zhou, Leonidas Guibas, and Dragomir Anguelov. 2023. NeRDi: Single-View NeRF Synthesis With Language-Guided Diffusion As General Image Priors. In Proc. of CVPR. 20637–20647.
  9. Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans. In Proc. of ICCV. 10786–10796.
  10. P. Favaro and S. Soatto. 2005. A geometric approach to shape from defocus. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 27, 3 (2005), 406–417.
  11. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. In Proc. of ICLR.
  12. Learning a Predictable and Generative Vector Representation for Objects. In Proc. of ECCV.
  13. AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation. In Proc. of CVPR.
  14. NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion. In Proc. of ICML.
  15. Fast and Explicit Neural View Synthesis. In Proc. of WACV. 3791–3800.
  16. Single-View View Synthesis in the Wild with Learned Adaptive Multiplane Images. In Proc. of ACM SIGGRAPH.
  17. Escaping Plato’s Cave: 3D Shape From Adversarial Rendering. In Proc. of ICCV.
  18. LoRA: Low-Rank Adaptation of Large Language Models. In Proc. of ICLR.
  19. Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models. arXiv:2303.11989 [cs.CV]
  20. Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis. In Proc. of ICCV. 5885–5894.
  21. Wonbong Jang and Lourdes Agapito. 2021. CodeNeRF: Disentangled Neural Radiance Fields for Object Categories. In Proc. of ICCV. 12949–12958.
  22. Learning Category-Specific Mesh Reconstruction from Image Collections. In Proc. of ECCV.
  23. HOLODIFFUSION: Training a 3D Diffusion Model Using 2D Images. In Proc. of CVPR. 18423–18433.
  24. K.N. Kutulakos and S.M. Seitz. 1999. A theory of shape by space carving. In Proc. of ICCV. 307–314 vol.1.
  25. A. Laurentini. 1994. The Visual Hull Concept for Silhouette-Based Image Understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 16, 2 (1994), 150–162.
  26. TRACER: Extreme Attention Guided Salient Object Tracing Network. In Proc. of AAAI Conference on Artificial Intelligence, Vol. 36. 12993–12994.
  27. Vision Transformer for NeRF-Based View Synthesis From a Single Input Image. In Proc. of WACV. 806–815.
  28. Zero-1-to-3: Zero-shot One Image to 3D Object. arXiv:2303.11328 [cs.CV]
  29. Angeline Loh. 2006. The recovery of 3-D structure using visual texture patterns. Ph. D. Dissertation.
  30. RealFusion: 360deg Reconstruction of Any Object From a Single Image. In Proc. of CVPR. 8446–8455.
  31. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In Proc. of ECCV.
  32. Making a “Completely Blind” Image Quality Analyzer. IEEE Signal Processing Letters 20, 3 (2013), 209–212.
  33. Point-E: A System for Generating 3D Point Clouds from Complex Prompts. arXiv:2212.08751 [cs.CV]
  34. Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision. In Proc. of CVPR.
  35. Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion. In Proc. of CVPR.
  36. DreamFusion: Text-to-3D using 2D Diffusion. In Proc. of ICLR.
  37. Learning Transferable Visual Models From Natural Language Supervision. In Proc. of ICML, Vol. 139. 8748–8763.
  38. DreamBooth3D: Subject-Driven Text-to-3D Generation. arXiv:2303.13508 [cs.CV]
  39. Dense Depth Priors for Neural Radiance Fields from Sparse Input Views. In Proc. of CVPR.
  40. High-Resolution Image Synthesis With Latent Diffusion Models. In Proc. of CVPR. 10684–10695.
  41. Radu Alexandru Rosu and Sven Behnke. 2023. PermutoSDF: Fast Multi-View Reconstruction With Implicit Surfaces Using Permutohedral Lattices. In Proc. of CVPR. 8466–8475.
  42. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. In Proc. of CVPR. 22500–22510.
  43. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization. In Proc. of ICCV.
  44. Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations. In Proc. of CVPR. 6229–6238.
  45. LAION-5B: An open large-scale dataset for training next generation image-text models. In Proc. of NeurIPS.
  46. 3D Photography using Context-aware Layered Depth Inpainting. In Proc. of CVPR.
  47. 3D Neural Field Generation Using Triplane Diffusion. In Proc. of CVPR. 20875–20886.
  48. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations. In Proc. of NeurIPS.
  49. Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior. arXiv:2303.14184 [cs.CV]
  50. Pre-train, Self-train, Distill: A simple recipe for Supersizing 3D Reconstruction. In Proc. of CVPR.
  51. Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields. In Proc. of CVPR.
  52. Exploiting Diffusion Prior for Real-World Image Super-Resolution. arXiv:2305.07015 [cs.CV]
  53. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images. In Proc. of ECCV.
  54. RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion. In Proc. of CVPR. 4563–4573.
  55. NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction. arXiv:2212.05231 [cs.CV]
  56. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.
  57. Novel View Synthesis with Diffusion Models. In Proc. of ICLR.
  58. Multiview Compressive Coding for 3D Reconstruction. In Proc. of CVPR. 9065–9075.
  59. MagicPony: Learning Articulated 3D Animals in the Wild. Proc. of CVPR.
  60. Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images. In Proc. of ICCV.
  61. NeuralLift-360: Lifting an In-the-Wild 2D Photo to a 3D Object With 360deg Views. In Proc. of CVPR. 4479–4489.
  62. Volume rendering of neural implicit surfaces. In Proc. of NeurIPS.
  63. Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance. In Proc. of NeurIPS.
  64. Shelf-Supervised Mesh Prediction in the Wild. In Proc. of CVPR.
  65. pixelNeRF: Neural Radiance Fields from One or Few Images. In Proc. of CVPR.
  66. MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction. In Proc. of NeurIPS.
  67. Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields. arXiv:2305.11588 [cs.CV]
  68. NeRF++: Analyzing and Improving Neural Radiance Fields.
  69. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proc. of CVPR.
  70. Shape-from-shading: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 21, 8 (1999), 690–706.
  71. Zhizhuo Zhou and Shubham Tulsiani. 2023. SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction. In Proc. of CVPR. 12588–12597.
Citations (1)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com