Papers
Topics
Authors
Recent
2000 character limit reached

Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors (2309.17261v2)

Published 29 Sep 2023 in cs.CV

Abstract: Reconstructing 3D objects from a single image guided by pretrained diffusion models has demonstrated promising outcomes. However, due to utilizing the case-agnostic rigid strategy, their generalization ability to arbitrary cases and the 3D consistency of reconstruction are still poor. In this work, we propose Consistent123, a case-aware two-stage method for highly consistent 3D asset reconstruction from one image with both 2D and 3D diffusion priors. In the first stage, Consistent123 utilizes only 3D structural priors for sufficient geometry exploitation, with a CLIP-based case-aware adaptive detection mechanism embedded within this process. In the second stage, 2D texture priors are introduced and progressively take on a dominant guiding role, delicately sculpting the details of the 3D model. Consistent123 aligns more closely with the evolving trends in guidance requirements, adaptively providing adequate 3D geometric initialization and suitable 2D texture refinement for different objects. Consistent123 can obtain highly 3D-consistent reconstruction and exhibits strong generalization ability across various objects. Qualitative and quantitative experiments show that our method significantly outperforms state-of-the-art image-to-3D methods. See https://Consistent123.github.io for a more comprehensive exploration of our generated 3D assets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Dreamavatar: Text-and-shape guided 3d human avatar generation via diffusion models. arXiv preprint arXiv:2304.00916, 2023.
  2. Efficient geometry-aware 3D generative adversarial networks. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  3. Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans. In The IEEE International Conference on Computer Vision (ICCV), pp.  10786–10796, 2021.
  4. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.
  5. Point2mesh: A self-prior for deformable meshes. arXiv preprint arXiv:2005.11084, 2020.
  6. Hqg-net: Unpaired medical image enhancement with high-quality guidance. arXiv preprint arXiv:2307.07829, 2023a.
  7. Camouflaged object detection with feature decomposition and edge reconstruction. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  22046–22055, 2023b.
  8. Weakly-supervised concealed object segmentation with sam-based pseudo labeling and multi-scale feature grouping. arXiv preprint arXiv:2305.11003, 2023c.
  9. Zero-shot text-guided object generation with dream fields. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  857–866, 2022.
  10. 3d common corruptions and data augmentation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  18963–18974, 2022.
  11. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proceedings of the 38th International Conference on Machine Learning (ICML), 2022.
  12. Magic3d: High-resolution text-to-3d content creation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  300–309, 2023.
  13. Zero-1-to-3: Zero-shot one image to 3d object. arXiv preprint arXiv:2303.11328, 2023.
  14. Realfusion: 360 reconstruction of any object from a single image. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  15. Occupancy networks: Learning 3d reconstruction in function space. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  4460–4470, 2019.
  16. Nerf: Representing scenes as neural radiance fields for view synthesis. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (eds.), Proceedings of the European conference on computer vision (ECCV), pp.  405–421, Cham, 2020. Springer International Publishing.
  17. Clip-mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia 2022 Conference Papers, SA ’22, 2022.
  18. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (TOG), 41(4):102:1–102:15, July 2022.
  19. Deepsdf: Learning continuous signed distance functions for shape representation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  165–174, 2019.
  20. Dreamfusion: Text-to-3d using 2d diffusion. In The Eleventh International Conference on Learning Representations (ICLR), 2023.
  21. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
  22. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), volume 139, pp.  8748–8763. PMLR, 18–24 Jul 2021a.
  23. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), pp.  8748–8763. PMLR, 2021b.
  24. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  25. Texture: Text-guided texturing of 3d shapes. arXiv preprint arXiv:2302.01721, 2023.
  26. Pivotal tuning for latent-based editing of real images. ACM Transactions on graphics (TOG), 42(1):1–13, 2022.
  27. High-resolution image synthesis with latent diffusion models. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  10684–10695, June 2022.
  28. Let 2d diffusion model know 3d-consistency for robust text-to-3d generation. arXiv preprint arXiv:2303.07937, 2023.
  29. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
  30. Jiaxiang Tang. Stable-dreamfusion: Text-to-3d with stable-diffusion, 2022. https://github.com/ashawkey/stable-dreamfusion.
  31. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. arXiv preprint arXiv:2303.14184, 2023.
  32. Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  3825–3834, 2022a.
  33. Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the European conference on computer vision (ECCV), pp.  52–67, 2018.
  34. Pretraining is all you need for image-to-image translation. arXiv preprint arXiv:2205.12952, 2022b.
  35. High-fidelity gan inversion for image attribute editing. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022c.
  36. Rodin: A generative model for sculpting 3d digital avatars using diffusion. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  4563–4573, 2023.
  37. Gram-hd: 3d-consistent image generation at high resolution with generative radiance manifolds. arXiv preprint arXiv:2206.07255, 2022.
  38. High-fidelity 3d gan inversion by pseudo-multi-view optimization. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  321–331, June 2023.
  39. Bridging vision and language encoders: Parameter-efficient tuning for referring image segmentation. arXiv preprint arXiv:2307.11545, 2023.
  40. 3d gan inversion with facial symmetry prior. arxiv:2211.16927, 2022.
  41. Closet: Modeling clothed humans on continuous surface with explicit template decomposition. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  501–511, 2023.
  42. The unreasonable effectiveness of deep features as a perceptual metric. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  586–595, 2018.
Citations (48)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.