WordRobe: Text-Guided Generation of Textured 3D Garments (2403.17541v2)
Abstract: In this paper, we tackle a new and challenging problem of text-driven generation of 3D garments with high-quality textures. We propose "WordRobe", a novel framework for the generation of unposed & textured 3D garment meshes from user-friendly text prompts. We achieve this by first learning a latent representation of 3D garments using a novel coarse-to-fine training strategy and a loss for latent disentanglement, promoting better latent interpolation. Subsequently, we align the garment latent space to the CLIP embedding space in a weakly supervised manner, enabling text-driven 3D garment generation and editing. For appearance modeling, we leverage the zero-shot generation capability of ControlNet to synthesize view-consistent texture maps in a single feed-forward inference step, thereby drastically decreasing the generation time as compared to existing methods. We demonstrate superior performance over current SOTAs for learning 3D garment latent space, garment interpolation, and text-driven texture synthesis, supported by quantitative evaluation and qualitative user study. The unposed 3D garment meshes generated using WordRobe can be directly fed to standard cloth simulation & animation pipelines without any post-processing.
- Dreambooth3d: Subject-driven text-to-3d generation. ICCV, 2023.
- Text2nerf: Text-driven 3d scene generation with neural radiance fields. arXiv preprint arXiv:2305.11588, 2023a.
- Dreamhuman: Animatable 3d avatars from text. 2023.
- CLO. https://www.clo3d.com/en/. URL https://www.clo3d.com/en/.
- Artec3D. https://www.artec3d.com/portable-3d-scanners.
- Bcnet: Learning body and cloth shape from a single image. In European Conference on Computer Vision. Springer, 2020.
- Smplicit: Topology-aware generative model for clothed people, 2021.
- SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, October 2015.
- Deepcloth: Neural garment representation for shape and style editing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2):1581–1593, 2023. doi: 10.1109/TPAMI.2022.3168569.
- Registering explicit to implicit: Towards high-fidelity garment mesh reconstruction from single images, 2022.
- xcloth: Extracting template-free textured 3d clothes from a monocular image. Proceedings of the 30th ACM International Conference on Multimedia, 2022.
- DrapeNet: Garment Generation and Self-Supervised Draping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023.
- Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023a.
- Dreamcraft3d: Hierarchical 3d generation with bootstrapped diffusion prior, 2023.
- Texture generation on 3d meshes with point-uv diffusion, 2023.
- Texture: Text-guided texturing of 3d shapes, 2023.
- Text2tex: Text-driven texture synthesis via diffusion models. arXiv preprint arXiv:2303.11396, 2023b.
- Adding conditional control to text-to-image diffusion models, 2023b.
- Multi-Garment Net: Learning to dress 3D people from images. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019. doi: 10.1109/ICCV.2019.00552.
- Deep fashion3d: A dataset and benchmark for 3d garment reconstruction from single images. ArXiv, abs/2003.12753, 2020.
- PIFu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019. doi: 10.1109/ICCV.2019.00239.
- PIFuHD: Multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. doi: 10.1109/cvpr42600.2020.00016.
- Pamir: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021. doi: 10.1109/TPAMI.2021.3050505.
- ECON: Explicit Clothed humans Optimized via Normal integration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023.
- CVIT. 3dhumans: A rich 3d dataset of scanned humans, 2021. URL http://cvit.iiit.ac.in/research/projects/cvit-projects/sharp-3dhumans-a-rich-3d-dataset-of-scanned-humans.
- Computational pattern making from 3d garment models. ACM Trans. Graph., 41(4), jul 2022. ISSN 0730-0301. doi: 10.1145/3528223.3530145. URL https://doi.org/10.1145/3528223.3530145.
- Generating datasets of 3d garments with sewing patterns. In J. Vanschoren and S. Yeung, editors, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1, 2021. URL https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/013d407166ec4fa56eb1e1f8cbe183b9-Paper-round1.pdf.
- Dreamfusion: Text-to-3d using 2d diffusion, 2022.
- Textdeformer: Geometry manipulation using text guidance, 2023.
- Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis, 2021.
- High-resolution image synthesis with latent diffusion models, 2022.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Denoising diffusion probabilistic models, 2020.
- Latent-nerf for shape-guided generation of 3d shapes and textures. arXiv preprint arXiv:2211.07600, 2022.
- invs: Repurposing diffusion inpainters for novel view synthesis, 2023.
- Efficientdreamer: High-fidelity and robust 3d creation via orthogonal-view diffusion prior, 2023.
- Mvdream: Multi-view diffusion for 3d generation, 2023.
- Objaverse: A universe of annotated 3d objects, 2022.
- Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (TOG), 2019.
- Occupancy networks: Learning 3d reconstruction in function space, 2019.
- Meshudf: Fast and differentiable meshing of unsigned distance field networks. In European Conference on Computer Vision, 2022.
- Styleclip: Text-driven manipulation of stylegan imagery, 2021.
- Cloth3d: clothed 3d humans. In European Conference on Computer Vision, pages 344–359. Springer, 2020.
- ULNeF: Untangled layered neural fields for mix-and-match virtual try-on. In Advances in Neural Information Processing Systems, (NeurIPS), 2022a.
- Multisource point clouds, point simplification and surface reconstruction. Remote Sensing, 11(22), 2019. ISSN 2072-4292. doi: 10.3390/rs11222659. URL https://www.mdpi.com/2072-4292/11/22/2659.
- A near-linear time algorithm for the chamfer distance, 2023.
- ClipFace: Text-guided Editing of Textured 3D Morphable Models. In ArXiv preprint arXiv:2212.01406, 2022.
- Shap-e: Generating conditional 3d implicit functions, 2023.
- Neural Representation of Open Surfaces. Computer Graphics Forum, 2023. ISSN 1467-8659. doi: 10.1111/cgf.14916.
- Modulating early visual processing by language, 2017.
- Decoupled weight decay regularization, 2019.
- Optcuts: Joint optimization of surface cuts and parameterization. ACM Transactions on Graphics, 37(6), 2018. doi: http://dx.doi.org/10.1145/3272127.3275042.
- Snug: Self-supervised neural dynamic garments, 2022b.
- Neural cloth simulation. ACM Transactions on Graphics, 41(6):1–14, November 2022. ISSN 1557-7368. doi: 10.1145/3550454.3555491. URL http://dx.doi.org/10.1145/3550454.3555491.
- HOOD: Hierarchical graphs for generalized modelling of clothing dynamics. 2023.
- Triposr: Fast 3d object reconstruction from a single image, 2024.
- Astitva Srivastava (9 papers)
- Pranav Manu (2 papers)
- Amit Raj (24 papers)
- Varun Jampani (125 papers)
- Avinash Sharma (25 papers)