Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation (2403.00372v3)

Published 1 Mar 2024 in cs.CV

Abstract: 3D shape generation from text is a fundamental task in 3D representation learning. The text-shape pairs exhibit a hierarchical structure, where a general text like ``chair" covers all 3D shapes of the chair, while more detailed prompts refer to more specific shapes. Furthermore, both text and 3D shapes are inherently hierarchical structures. However, existing Text2Shape methods, such as SDFusion, do not exploit that. In this work, we propose HyperSDFusion, a dual-branch diffusion model that generates 3D shapes from a given text. Since hyperbolic space is suitable for handling hierarchical data, we propose to learn the hierarchical representations of text and 3D shapes in hyperbolic space. First, we introduce a hyperbolic text-image encoder to learn the sequential and multi-modal hierarchical features of text in hyperbolic space. In addition, we design a hyperbolic text-graph convolution module to learn the hierarchical features of text in hyperbolic space. In order to fully utilize these text features, we introduce a dual-branch structure to embed text features in 3D feature space. At last, to endow the generated 3D shapes with a hierarchical structure, we devise a hyperbolic hierarchical loss. Our method is the first to explore the hyperbolic hierarchical representation for text-to-shape generation. Experimental results on the existing text-to-shape paired dataset, Text2Shape, achieved state-of-the-art results. We release our implementation under HyperSDFusion.github.io.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation. In CVPR, pages 12608–12618, 2023.
  2. Hyperbolic image segmentation. In CVPR, pages 4453–4462, 2022.
  3. Pix2video: Video editing using image diffusion. In ICCV, pages 23206–23217, 2023.
  4. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  5. Text2shape: Generating shapes from natural language by learning joint embeddings. In ACCV, pages 100–116, 2018.
  6. Diffusiondet: Diffusion model for object detection. In ICCV, pages 19830–19843, 2023.
  7. An empirical study of training self-supervised vision transformers. in 2021 ieee. In ICCV, pages 9620–9629.
  8. Sdfusion: Multimodal 3d shape completion, reconstruction, and generation. In CVPR, pages 4456–4465, 2023.
  9. Noam Chomsky. Syntactic structures. Mouton de Gruyter, 2002.
  10. Hyperbolic image-text representations. In International Conference on Machine Learning, pages 7694–7731, 2023.
  11. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  12. Diffusion models beat gans on image synthesis. NeurIPS, 34:8780–8794, 2021.
  13. Sequence-to-sequence generation for spoken dialogue via deep syntax trees and strings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 45–51, 2016.
  14. Hyperbolic vision transformers: Combining improvements in metric learning. In CVPR, pages 7409–7419, 2022.
  15. Shapecrafter: A recursive text-conditioned 3d shape generation model. NeurIPS, 35:8882–8895, 2022.
  16. Flexible diffusion modeling of long videos. NeurIPS, 35:27953–27965, 2022.
  17. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. 2017.
  18. Holodiffusion: Training a 3d diffusion model using 2d images. In CVPR, pages 18423–18433, 2023.
  19. Guided motion diffusion for controllable human motion synthesis. In ICCV, pages 2151–2162, 2023.
  20. Hyperbolic image embeddings. In CVPR, pages 6418–6428, 2020.
  21. Podia-3d: Domain adaptation of 3d generative model across large domain gap using pose-preserved text-to-image diffusion. In ICCV, pages 22603–22612, 2023.
  22. Geoopt: Riemannian optimization in pytorch. arXiv preprint arXiv:2005.02819, 2020.
  23. The euclidean space is evil: Hyperbolic attribute editing for few-shot image generation. In ICCV, pages 22714–22724, 2023a.
  24. Diffusion-sdf: Text-to-shape via voxelized diffusion. In CVPR, pages 12642–12651, 2023b.
  25. Magic3d: High-resolution text-to-3d content creation. In CVPR, pages 300–309, 2023a.
  26. Hyperbolic chamfer distance for point cloud completion. In ICCV, pages 14595–14606, 2023b.
  27. Towards implicit text-guided 3d shape generation. In CVPR, pages 17896–17906, 2022.
  28. Learning versatile 3d shape generation with improved auto-regressive models. In ICCV, pages 14139–14149, 2023.
  29. Hyperbolic deep learning in computer vision: A survey. arXiv preprint arXiv:2305.06611, 2023.
  30. Autosdf: Shape priors for 3d completion, reconstruction and generation. In CVPR, pages 306–315, 2022.
  31. Rethinking the compositionality of point clouds through regularization in the hyperbolic space. NeurIPS, 35:33741–33753, 2022.
  32. 3d compositional zero-shot learning with decompositional consensus. In ECCV, pages 713–730. Springer, 2022.
  33. Scalable diffusion models with transformers. In ICCV, pages 4195–4205, 2023.
  34. Hyperbolic deep neural networks: A survey. IEEE TPAMI, 44(12):10023–10044, 2021.
  35. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  36. Dreambooth3d: Subject-driven text-to-3d generation. In ICCV, pages 2349–2359, 2023.
  37. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  38. 3dcompat++: An improved large-scale 3d vision dataset for compositional recognition. arXiv preprint arXiv:2310.18511, 2023.
  39. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  40. What do single-view 3d reconstruction networks learn? In CVPR, pages 3405–3414, 2019.
  41. Shapescaffolder: Structure-aware 3d shape generation from text. In ICCV, pages 2715–2724, 2023.
  42. Neural discrete representation learning. NeurIPS, 30, 2017.
  43. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In CVPR, pages 12619–12629, 2023.
  44. Harnessing the spatial-temporal attention of diffusion models for high-fidelity text-to-image synthesis. In ICCV, pages 7766–7776, 2023.
  45. Dream3d: Zero-shot text-to-3d synthesis using 3d shape prior and text-to-image diffusion models. In CVPR, pages 20908–20918, 2023.
  46. Hyperbolic graph neural networks: a review of methods and applications. arXiv preprint arXiv:2202.13852, 2022.
  47. Physdiff: Physics-guided human motion diffusion model. In ICCV, pages 16010–16021, 2023.
  48. Motiondiffuse: Text-driven human motion generation with diffusion model. arXiv preprint arXiv:2208.15001, 2022.
  49. Syntax-infused variational autoencoder for text generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2069–2078, 2019.
  50. Hyperbolic graph neural networks: A tutorial on methods and applications. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5843–5844, 2023.
  51. Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In CVPR, pages 12588–12597, 2023.
Citations (2)

Summary

We haven't generated a summary for this paper yet.