Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors (2312.04963v1)
Abstract: Most 3D generation research focuses on up-projecting 2D foundation models into the 3D space, either by minimizing 2D Score Distillation Sampling (SDS) loss or fine-tuning on multi-view datasets. Without explicit 3D priors, these methods often lead to geometric anomalies and multi-view inconsistency. Recently, researchers have attempted to improve the genuineness of 3D objects by directly training on 3D datasets, albeit at the cost of low-quality texture generation due to the limited texture diversity in 3D datasets. To harness the advantages of both approaches, we propose Bidirectional Diffusion(BiDiff), a unified framework that incorporates both a 3D and a 2D diffusion process, to preserve both 3D fidelity and 2D texture richness, respectively. Moreover, as a simple combination may yield inconsistent generation results, we further bridge them with novel bidirectional guidance. In addition, our method can be used as an initialization of optimization-based models to further improve the quality of 3D model and efficiency of optimization, reducing the generation process from 3.4 hours to 20 minutes. Experimental results have shown that our model achieves high-quality, diverse, and scalable 3D generation. Project website: https://bidiff.github.io/.
- Learning representations and generative models for 3d point clouds. In International conference on machine learning, pages 40–49. PMLR, 2018.
- Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
- Learning implicit fields for generative shape modeling. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- SDFusion: Multimodal 3d shape completion, reconstruction, and generation. arXiv, 2022.
- Objaverse: A universe of annotated 3d objects. arXiv preprint arXiv:2212.08051, 2022.
- Nerdi: Single-view nerf synthesis with language-guided diffusion as general image priors. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20637–20647, 2023.
- Hyperdiffusion: Generating implicit neural fields with weight-space diffusion, 2023.
- Deep Floyd. If project. https://github.com/deep-floyd/IF, 2023.
- Sdm-net: Deep generative network for structured deformable mesh. ACM Transactions on Graphics (TOG), 38:1–15, 2019.
- Escaping plato’s cave: 3d shape from adversarial rendering. In The IEEE International Conference on Computer Vision (ICCV), 2019.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Octree transformer: Autoregressive 3d shape generation on hierarchically structured sequences. arXiv preprint arXiv:2111.12480, 2021.
- Zero-shot text-guided object generation with dream fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 867–876, 2022.
- Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463, 2023.
- Clip-mesh: Generating textured meshes from text using pretrained image-text models. SIGGRAPH Asia 2022 Conference Papers, 2022.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
- Magic3d: High-resolution text-to-3d content creation. arXiv preprint arXiv:2211.10440, 2022.
- Zero-1-to-3: Zero-shot one image to 3d object. arXiv preprint arXiv:2303.11328, 2023a.
- Meshdiffusion: Score-based generative 3d mesh modeling. In International Conference on Learning Representations, 2023b.
- Sparseneus: Fast generalizable neural surface reconstruction from sparse views. In European Conference on Computer Vision, pages 210–227. Springer, 2022.
- Realfusion: 360 reconstruction of any object from a single image. In CVPR, 2023.
- Latent-nerf for shape-guided generation of 3d shapes and textures. arXiv preprint arXiv:2211.07600, 2022.
- Latent-nerf for shape-guided generation of 3d shapes and textures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12663–12673, 2023.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
- Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.
- Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 165–174, 2019.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
- Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- Let 2d diffusion model know 3d-consistency for robust text-to-3d generation. arXiv preprint arXiv:2303.07937, 2023.
- Mvdream: Multi-view diffusion for 3d generation. arXiv:2308.16512, 2023.
- Deep unsupervised learning using nonequilibrium thermodynamics. In Conference on Robot Learning, pages 87–96. PMLR, 2017.
- Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. arXiv preprint arXiv:2212.00774, 2022.
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023.
- Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in neural information processing systems, pages 82–90, 2016.
- Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4541–4550, 2019.
- pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021.
- Lion: Latent point diffusion models for 3d shape generation. arXiv preprint arXiv:2210.06978, 2022.
- Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
- Lihe Ding (11 papers)
- Shaocong Dong (7 papers)
- Zhanpeng Huang (7 papers)
- Zibin Wang (7 papers)
- Yiyuan Zhang (21 papers)
- Kaixiong Gong (12 papers)
- Dan Xu (120 papers)
- Tianfan Xue (62 papers)