iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views (2312.17250v1)
Abstract: We present iFusion, a novel 3D object reconstruction framework that requires only two views with unknown camera poses. While single-view reconstruction yields visually appealing results, it can deviate significantly from the actual object, especially on unseen sides. Additional views improve reconstruction fidelity but necessitate known camera poses. However, assuming the availability of pose may be unrealistic, and existing pose estimators fail in sparse view scenarios. To address this, we harness a pre-trained novel view synthesis diffusion model, which embeds implicit knowledge about the geometry and appearance of diverse objects. Our strategy unfolds in three steps: (1) We invert the diffusion model for camera pose estimation instead of synthesizing novel views. (2) The diffusion model is fine-tuned using provided views and estimated poses, turned into a novel view synthesizer tailored for the target object. (3) Leveraging registered views and the fine-tuned diffusion model, we reconstruct the 3D object. Experiments demonstrate strong performance in both pose estimation and novel view synthesis. Moreover, iFusion seamlessly integrates with various reconstruction methods and enhances them.
- Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation. In CVPR, 2023.
- Spatext: Spatio-textual representation for controllable image generation. In CVPR, 2023.
- GeNVS: Generative novel view synthesis with 3D-aware diffusion models. In ICCV, 2023.
- Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
- Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In CVPR, 2021.
- Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In ICCV, 2023.
- Category level object pose estimation via neural analysis-by-synthesis. In ECCV, 2020.
- Objaverse: A universe of annotated 3d objects. In CVPR, 2023.
- Google scanned objects: A high-quality dataset of 3d scanned household items. In ICRA, 2022.
- Frido: Feature pyramid diffusion for complex scene image synthesis. In AAAI, 2023.
- Make-a-scene: Scene-based text-to-image generation with human priors. In ECCV, 2022.
- An image is worth one word: Personalizing text-to-image generation using textual inversion. In ICLR, 2023.
- Nerfdiff: Single-image view synthesis with nerf-guided distillation from 3d-aware diffusion. In ICML, 2023.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- LoRA: Low-rank adaptation of large language models. In ICLR, 2022.
- Dreamtime: An improved optimization strategy for text-to-3d content creation. arXiv preprint arXiv:2306.12422, 2023.
- Putting nerf on a diet: Semantically consistent few-shot view synthesis. In ICCV, 2021.
- Zero-shot text-guided object generation with dream fields. In CVPR, 2022.
- Few-view object reconstruction with unknown categories and camera poses. arXiv preprint arXiv:2212.04492, 2022.
- Leap: Liberate sparse-view 3d modeling from camera poses. arXiv preprint arXiv:2310.01410, 2023.
- Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463, 2023.
- Holodiffusion: Training a 3d diffusion model using 2d images. In CVPR, 2023.
- 3d gaussian splatting for real-time radiance field rendering. ACM TOG, 2023.
- Infonerf: Ray entropy minimization for few-shot neural volume rendering. In CVPR, 2022.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- Gligen: Open-set grounded text-to-image generation. In CVPR, 2023.
- Relpose++: Recovering 6d poses from sparse-view observations. arXiv preprint arXiv:2305.04926, 2023a.
- Barf: Bundle-adjusting neural radiance fields. In ICCV, 2021.
- Magic3d: High-resolution text-to-3d content creation. In CVPR, 2023b.
- One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. In NeurIPS, 2023a.
- Zero-1-to-3: Zero-shot one image to 3d object. In ICCV, 2023b.
- Syncdreamer: Learning to generate multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023c.
- Sparseneus: Fast generalizable neural surface reconstruction from sparse views. In ECCV, 2022.
- Wonder3d: Single image to 3d using cross-domain diffusion. arXiv preprint arXiv:2310.15008, 2023.
- Diffusion probabilistic models for 3d point cloud generation. In CVPR, 2021.
- Realfusion: 360 reconstruction of any object from a single image. In CVPR, 2023.
- SDEdit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2022.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Clip-mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia, 2022.
- Diffrf: Rendering-guided 3d radiance field diffusion. In CVPR, 2023.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG, 2022.
- Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.
- Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In CVPR, 2022.
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- Latentfusion: End-to-end differentiable reconstruction and rendering for unseen object pose estimation. In CVPR, 2020.
- Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
- Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023.
- Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
- Unicontrol: A unified diffusion model for controllable visual generation in the wild. In NeurIPS, 2023.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Dreambooth3d: Subject-driven text-to-3d generation. In ICCV, 2023.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, 2023.
- Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022.
- Structure-from-motion revisited. In CVPR, 2016.
- H. Scudder. Probability of error of some adaptive pattern-recognition machines. IEEE Transactions on Information Theory, 11(3):363–371, 1965.
- 3d neural field generation using triplane diffusion. In CVPR, 2023.
- SparsePose: Sparse-view camera pose regression and refinement. In CVPR, 2023.
- Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
- A micro lie theory for state estimation in robotics. arXiv preprint arXiv:1812.01537, 2018.
- Generative modeling by estimating gradients of the data distribution. In NeurIPS, 2019.
- Test-time training with self-supervision for generalization under distribution shifts. In ICML, 2020.
- Viewset diffusion:(0-) image-conditioned 3d generative models from 2d data. In ICCV, 2023.
- Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023a.
- Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. In ICCV, 2023b.
- Tent: Fully test-time adaptation by entropy minimization. In ICLR, 2021a.
- Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In CVPR, 2023a.
- PoseDiffusion: Solving pose estimation via diffusion-aided bundle adjustment. In ICCV, 2023b.
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In NeurIPS, 2021b.
- Ibrnet: Learning multi-view image-based rendering. In CVPR, 2021c.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. In NeurIPS, 2023c.
- Novel view synthesis with diffusion models. In ICLR, 2023.
- Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation. In CVPR, 2023.
- Self-training with noisy student improves imagenet classification. In CVPR, 2020.
- Freestyle layout-to-image synthesis. In CVPR, 2023.
- Law-diffusion: Complex scene generation by diffusion with layouts. In ICCV, 2023a.
- Freenerf: Improving few-shot neural rendering with free frequency regularization. In CVPR, 2023b.
- iNeRF: Inverting neural radiance fields for pose estimation. In IROS, 2021.
- pixelNeRF: Neural radiance fields from one or few images. In CVPR, 2021.
- Lion: Latent point diffusion models for 3d shape generation. In NeurIPS, 2022.
- Ners: Neural reflectance surfaces for sparse-view 3d reconstruction in the wild. In NeurIPS, 2021.
- Relpose: Predicting probabilistic relative rotation for single objects in the wild. In ECCV, 2022.
- Adding conditional control to text-to-image diffusion models. In ICCV, 2023.
- Uni-controlnet: All-in-one control to text-to-image diffusion models. In NeurIPS, 2023.
- 3d shape generation and completion through point-voxel diffusion. In ICCV, 2021.
- Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In CVPR, 2023.