Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EXIM: A Hybrid Explicit-Implicit Representation for Text-Guided 3D Shape Generation (2311.01714v2)

Published 3 Nov 2023 in cs.CV

Abstract: This paper presents a new text-guided technique for generating 3D shapes. The technique leverages a hybrid 3D shape representation, namely EXIM, combining the strengths of explicit and implicit representations. Specifically, the explicit stage controls the topology of the generated 3D shapes and enables local modifications, whereas the implicit stage refines the shape and paints it with plausible colors. Also, the hybrid approach separates the shape and color and generates color conditioned on shape to ensure shape-color consistency. Unlike the existing state-of-the-art methods, we achieve high-fidelity shape generation from natural-language descriptions without the need for time-consuming per-shape optimization or reliance on human-annotated texts during training or test-time optimization. Further, we demonstrate the applicability of our approach to generate indoor scenes with consistent styles using text-induced 3D shapes. Through extensive experiments, we demonstrate the compelling quality of our results and the high coherency of our generated shapes with the input texts, surpassing the performance of existing methods by a significant margin. Codes and models are released at https://github.com/liuzhengzhe/EXIM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. ShapeNet: An Information-Rich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR].
  2. Text2shape: Generating shapes from natural language by learning joint embeddings. In ACCV.
  3. Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation. ICCV (2023).
  4. Tango: Text-driven photorealistic and robust 3D stylization via lighting decomposition. NeurIPS (2022).
  5. BAE-Net: Branched autoencoder for shape co-segmentation. In CVPR.
  6. Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shape modeling. In CVPR.
  7. SDFusion: Multimodal 3D shape completion, reconstruction, and generation. In CVPR.
  8. Implicit functions in feature space for 3D shape reconstruction and completion. In CVPR.
  9. Albert Cohen. 1992. Biorthogonal wavelets. Wavelets: A Tutorial in Theory and Applications (1992).
  10. DiffEdit: Diffusion-based semantic image editing with mask guidance. ICLR (2023).
  11. 3D-FUTURE: 3D furniture shape with texture. IJCV (2021).
  12. Shapecrafter: A recursive text-conditioned 3D shape generation model. NeurIPS (2022).
  13. GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images. NeurIPS (2022).
  14. Spaghetti: Editing implicit shapes through part aware generation. TOG (2022).
  15. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. NIPS (2017).
  16. Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. NeurIPS Workshop (2022).
  17. AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars. ACM TOG (SIGGRAPH) (2022).
  18. Neural wavelet-domain diffusion for 3D shape generation. In SIGGRAPH Asia 2022 Conference Papers.
  19. Progressive point cloud deconvolution generation network. In ECCV.
  20. Zero-Shot Text-Guided Object Generation with Drefam Fields. In CVPR.
  21. 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation. AAAI (2023).
  22. Heewoo Jun and Alex Nichol. 2023. Shap-E: Generating conditional 3D implicit functions. arXiv preprint arXiv:2305.02463 (2023).
  23. 3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models. arXiv preprint arXiv:2211.14108 (2022).
  24. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. ICML (2023).
  25. Diffusion-SDF: Text-to-Shape via Voxelized Diffusion. CVPR (2023).
  26. SP-GAN: sphere-guided 3D shape generation and manipulation. ACM TOG (SIGGRAPH) (2021).
  27. Magic3D: High-Resolution Text-to-3D Content Creation. CVPR (2023).
  28. ISS: Image as stetting stone for text-guided 3D shape generation. ICLR (2023).
  29. Towards Implicit Text-Guided 3D𝐷Ditalic_D Shape Generation. In CVPR.
  30. Repaint: Inpainting using denoising diffusion probabilistic models. In CVPR.
  31. Shitong Luo and Wei Hu. 2021. Diffusion probabilistic models for 3D point cloud generation. In CVPR.
  32. Occupancy networks: Learning 3D reconstruction in function space. In CVPR.
  33. Text2Mesh: Text-Driven Neural Stylization for Meshes. In CVPR.
  34. AutoSDF: Shape priors for 3D completion, reconstruction and generation. In CVPR.
  35. CLIP-Mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia Conference Paper.
  36. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. ICLM (2022).
  37. Point-E: A System for Generating 3D Point Clouds from Complex Prompts. arXiv preprint arXiv:2212.08751 (2022).
  38. Differentiable volumetric rendering: Learning implicit 3D representations without 3D supervision. In CVPR.
  39. Texture fields: Learning texture representations in function space. In ICCV.
  40. Atiss: Autoregressive transformers for indoor scene synthesis. NeurIPS (2021).
  41. DreamFusion: Text-to-3D using 2D diffusion. ICLR (2023).
  42. Learning transferable visual models from natural language supervision. In ICML.
  43. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125 (2022).
  44. High-resolution image synthesis with latent diffusion models. In CVPR.
  45. Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS (2022).
  46. CLIP-Forge: Towards zero-shot text-to-shape generation. In CVPR.
  47. Text-Sculpture: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Text. CVPR (2023).
  48. 3D Neural Field Generation using Triplane Diffusion. arXiv preprint arXiv:2211.16677 (2022).
  49. Edward J Smith and David Meger. 2017. Improved adversarial systems for 3D object generation and reconstruction. In CoRL.
  50. Denoising diffusion implicit models. ICLR (2021).
  51. Jiaxiang Tang. 2022. Stable-dreamfusion: Text-to-3D with Stable-diffusion. https://github.com/ashawkey/stable-dreamfusion.
  52. A skeleton-bridged deep learning approach for generating meshes of complex topologies from single rgb images. In CVPR.
  53. LION: Latent Point Diffusion Models for 3D Shape Generation. NeurIPS (2022).
  54. Attention is all you need. In NIPS.
  55. CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields. In CVPR.
  56. Dual octree graph networks for learning adaptive volumetric shape representations. TOG (SIGGRAPH) (2022).
  57. TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision. CVPR (2023).
  58. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. NIPS (2016).
  59. Shapeformer: Transformer-based shape completion via sparse representation. In CVPR.
  60. 3DILG: Irregular latent grids for 3D generative modeling. NeurIPS (2022).
  61. 3Dshape2vecset: A 3D shape representation for neural fields and generative diffusion models. arXiv preprint arXiv:2301.11445 (2023).
  62. MiniGPT-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592 (2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhengzhe Liu (22 papers)
  2. Jingyu Hu (19 papers)
  3. Ka-Hei Hui (16 papers)
  4. Xiaojuan Qi (133 papers)
  5. Daniel Cohen-Or (172 papers)
  6. Chi-Wing Fu (104 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com