ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation (2312.02201v1)
Abstract: We introduce "ImageDream," an innovative image-prompt, multi-view diffusion model for 3D object generation. ImageDream stands out for its ability to produce 3D models of higher quality compared to existing state-of-the-art, image-conditioned methods. Our approach utilizes a canonical camera coordination for the objects in images, improving visual geometry accuracy. The model is designed with various levels of control at each block inside the diffusion model based on the input image, where global control shapes the overall object layout and local control fine-tunes the image details. The effectiveness of ImageDream is demonstrated through extensive evaluations using a standard prompt list. For more information, visit our project page at https://Image-Dream.github.io.
- stable-diffusion-xl-base-1.0. https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0. Accessed: 2023-08-29.
- Stable diffusion image variation. https://huggingface.co/spaces/lambdalabs/stable-diffusion-image-variations.
- Efficient geometry-aware 3d generative adversarial networks. In CVPR, 2022.
- GeNVS: Generative novel view synthesis with 3D-aware diffusion models. In arXiv, 2023.
- Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
- Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv:2303.13873, 2023.
- Objaverse-xl: A universe of 10m+ 3d objects. 2023a.
- Objaverse: A universe of annotated 3d objects. In CVPR, pages 13142–13153, 2023b.
- Gram: Generative radiance manifolds for 3d-aware image generation. In CVPR, pages 10673–10683, 2022.
- Get3d: A generative model of high quality 3d textured shapes learned from images. NeurIPS, 2022.
- Learning single-image 3d reconstruction by generative modelling of shape, pose and shading. International Journal of Computer Vision, 2020.
- Leveraging 2d data to learn textured 3d mesh generation. In CVPR, 2020.
- Dreamtime: An improved optimization strategy for text-to-3d content creation. arXiv:2306.12422, 2023.
- Shap-e: Generating conditional 3d implicit functions. arXiv:2305.02463, 2023.
- Holodiffusion: Training a 3d diffusion model using 2d images. In CVPR, 2023.
- Auto-encoding variational bayes. In ICLR, 2014.
- Magic3d: High-resolution text-to-3d content creation. In CVPR, 2023a.
- Common diffusion noise schedules and sample steps are flawed. arXiv:2305.08891, 2023b.
- Zero-1-to-3: Zero-shot one image to 3d object. arXiv:2303.11328, 2023a.
- Syncdreamer: Learning to generate multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023b.
- Wonder3d: Single image to 3d using cross-domain diffusion, 2023.
- Realfusion: 360deg reconstruction of any object from a single image. In CVPR, 2023.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2021.
- Hologan: Unsupervised learning of 3d representations from natural images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
- Blockgan: Learning 3d object-aware scene representations from unlabelled images. NeurIPS, 2020.
- Point-e: A system for generating 3d point clouds from complex prompts. arXiv:2212.08751, 2022.
- Giraffe: Representing scenes as compositional generative neural feature fields. In CVPR, 2021.
- Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In CVPR, 2022.
- Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv:2307.01952, 2023.
- Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023.
- Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Improved techniques for training gans. NeurIPS, 2016.
- Laion-5b: An open large-scale dataset for training next generation image-text models. NeurIPS, 2022.
- Zero123++: a single image to consistent multi-view diffusion base model. arXiv preprint arXiv:2310.15110, 2023a.
- Mvdream: Multi-view diffusion for 3d generation. arXiv:2308.16512, 2023b.
- 3d neural field generation using triplane diffusion. In CVPR, 2023.
- Scene representation networks: Continuous 3d-structure-aware neural scene representations. NeurIPS, 32, 2019.
- Dreamcraft3d: Hierarchical 3d generation with bootstrapped diffusion prior, 2023.
- Viewset diffusion:(0-) image-conditioned 3d generative models from 2d data, 2023.
- Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. arXiv:2303.14184, 2023.
- Textmesh: Generation of realistic 3d meshes from text prompts. arXiv:2304.12439, 2023.
- Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In CVPR, 2023a.
- Rodin: A generative model for sculpting 3d digital avatars using diffusion. In CVPR, 2023b.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv:2305.16213, 2023c.
- Novel view synthesis with diffusion models. In ICLR, 2023.
- Multiview compressive coding for 3d reconstruction. In CVPR, 2023.
- Sinnerf: Training neural radiance fields on complex scenes from a single image. 2022a.
- Neurallift-360: Lifting an in-the-wild 2d photo to a 3d object with 360° views. 2022b.
- Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:2308.06721, 2023a.
- Consistent-1-to-3: Consistent image to 3d view synthesis via geometry-aware diffusion models. arXiv preprint arXiv:2310.03020, 2023b.
- Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In CVPR, 2023.
- Peng Wang (832 papers)
- Yichun Shi (40 papers)