Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views (2308.14078v2)

Published 27 Aug 2023 in cs.CV

Abstract: Reconstructing 3D objects from extremely sparse views is a long-standing and challenging problem. While recent techniques employ image diffusion models for generating plausible images at novel viewpoints or for distilling pre-trained diffusion priors into 3D representations using score distillation sampling (SDS), these methods often struggle to simultaneously achieve high-quality, consistent, and detailed results for both novel-view synthesis (NVS) and geometry. In this work, we present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs. Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field. Specifically, we employ a controller that harnesses epipolar features from input views, guiding a pre-trained diffusion model, such as Stable Diffusion, to produce novel-view images that maintain 3D consistency with the input. By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results, even when faced with open-world objects. To address the blurriness introduced by conventional SDS, we introduce the category-score distillation sampling (C-SDS) to enhance detail. We conduct experiments on CO3DV2 which is a multi-view dataset of real-world objects. Both quantitative and qualitative evaluations demonstrate that our approach outperforms previous state-of-the-art works on the metrics regarding NVS and geometry reconstruction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Generative Novel View Synthesis with 3D-Aware Diffusion Models. CoRR, abs/2304.02602.
  2. Stereo Radiance Fields (SRF): Learning View Synthesis from Sparse Views of Novel Scenes. In IEEE (CVPR).
  3. Depth-supervised NeRF: Fewer Views and Faster Training for Free. In IEEE CVPR, 12872–12881.
  4. Image Quality Assessment: Unifying Structure and Texture Similarity. IEEE TPAMI., 44(5): 2567–2581.
  5. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618.
  6. NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion. CoRR, abs/2302.10109.
  7. threestudio: A unified framework for 3D content generation. https://github.com/threestudio-project/threestudio.
  8. Deep Residual Learning for Image Recognition. In IEEE CVPR, 770–778.
  9. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NeurIPS, 6626–6637.
  10. Zero-Shot Text-Guided Object Generation with Dream Fields. In IEEE CVPR, 857–866.
  11. Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis. In ICCV, 5865–5874.
  12. ViewFormer: NeRF-Free Neural Rendering from Few Images Using Transformers. In ECCV, volume 13675, 198–216.
  13. Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1931–1941.
  14. NerfAcc: Efficient Sampling Accelerates NeRFs. CoRR, abs/2305.04966.
  15. Neuralangelo: High-Fidelity Neural Surface Reconstruction. In IEEE CVPR.
  16. Magic3D: High-Resolution Text-to-3D Content Creation. In IEEE CVPR.
  17. Zero-1-to-3: Zero-shot One Image to 3D Object. arXiv:2303.11328.
  18. Marching cubes: A high resolution 3D surface construction algorithm. In Stone, M. C., ed., SIGGRAPH, 163–169.
  19. The Contextual Loss for Image Transformation with Non-aligned Data. In ECCV, volume 11218, 800–815.
  20. RealFusion: 360 Reconstruction of Any Object from a Single Image. In IEEE CVPR.
  21. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV, 405–421.
  22. Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG, 41(4): 102:1–102:15.
  23. Point-E: A System for Generating 3D Point Clouds from Complex Prompts. abs/2212.08751.
  24. RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs. In IEEE CVPR, 5470–5480.
  25. Do 2D GANs Know 3D Shape? Unsupervised 3D Shape Reconstruction from 2D Image GANs. In ICLR.
  26. DreamFusion: Text-to-3D using 2D Diffusion. In ICLR.
  27. Learning Transferable Visual Models From Natural Language Supervision. In ICML, volume 139, 8748–8763.
  28. Zero-Shot Text-to-Image Generation. In ICML, volume 139, 8821–8831.
  29. Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction. In ICCV.
  30. Dense Depth Priors for Neural Radiance Fields from Sparse Input Views. In IEEE CVPR, 12882–12891.
  31. High-Resolution Image Synthesis with Latent Diffusion Models. In IEEE CVPR, 10674–10685.
  32. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 22500–22510.
  33. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In NeurIPS.
  34. Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations. In IEEE CVPR, 6219–6228.
  35. Structure-from-Motion Revisited. In IEEE CVPR, 4104–4113.
  36. Pixelwise View Selection for Unstructured Multi-View Stereo. In ECCV, volume 9907, 501–518.
  37. LAION-5B: An open large-scale dataset for training next generation image-text models. In NeurIPS.
  38. Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation. abs/2303.07937.
  39. Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis. In NeurIPS.
  40. Generalizable Patch-Based Neural Rendering. In ECCV, volume 13692, 156–174.
  41. Light Field Neural Rendering. In IEEE CVPR, 8259–8269.
  42. Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior. arXiv preprint arXiv:2303.14184.
  43. Neural Discrete Representation Learning. In NeurIPS, 6306–6315.
  44. Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields. In IEEE CVPR, 5481–5490.
  45. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers.
  46. CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields. In IEEE CVPR, 3825–3834.
  47. Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation. IEEE CVPR.
  48. NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction. In NeurIPS, 27171–27183.
  49. IBRNet: Learning Multi-View Image-Based Rendering. In IEEE CVPR, 4690–4699.
  50. ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation. CoRR, abs/2305.16213.
  51. FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization. In IEEE CVPR.
  52. MVSNet: Depth Inference for Unstructured Multi-view Stereo. In ECCV, 785–801.
  53. Volume Rendering of Neural Implicit Surfaces. In NeurIPS.
  54. DreamSparse: Escaping from Plato’s Cave with 2D Frozen Diffusion Model Given Sparse Views. CoRR, abs/2306.03414.
  55. pixelNeRF: Neural Radiance Fields From One or Few Images. In IEEE CVPR, 4578–4587.
  56. Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation. abs/2307.13908.
  57. Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement. In IEEE CVPR, 1946–1955.
  58. MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction. In NeurIPS.
  59. Adding Conditional Control to Text-to-Image Diffusion Models. arXiv:2302.05543.
  60. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In IEEE CVPR, 586–595.
  61. SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction. In CVPR.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zi-Xin Zou (11 papers)
  2. Weihao Cheng (9 papers)
  3. Yan-Pei Cao (58 papers)
  4. Shi-Sheng Huang (9 papers)
  5. Ying Shan (252 papers)
  6. Song-Hai Zhang (41 papers)
Citations (21)