DressCode: Autoregressively Sewing and Generating Garments from Text Guidance (2401.16465v4)
Abstract: Apparel's significant role in human appearance underscores the importance of garment digitalization for digital human creation. Recent advances in 3D content creation are pivotal for digital human creation. Nonetheless, garment generation from text guidance is still nascent. We introduce a text-driven 3D garment generation framework, DressCode, which aims to democratize design for novices and offer immense potential in fashion design, virtual try-on, and digital human creation. We first introduce SewingGPT, a GPT-based architecture integrating cross-attention with text-conditioned embedding to generate sewing patterns with text guidance. We then tailor a pre-trained Stable Diffusion to generate tile-based Physically-based Rendering (PBR) textures for the garments. By leveraging a LLM, our framework generates CG-friendly garments through natural language interaction. It also facilitates pattern completion and texture editing, streamlining the design process through user-friendly interaction. This framework fosters innovation by allowing creators to freely experiment with designs and incorporate unique elements into their work. With comprehensive evaluations and comparisons with other state-of-the-art methods, our method showcases superior quality and alignment with input prompts. User studies further validate our high-quality rendering results, highlighting its practical utility and potential in production settings. Our project page is https://IHe-KaiI.github.io/DressCode/.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Autodesk, INC. Maya, 2019.
- AUTOMATIC1111. Stable Diffusion Web UI, Aug. 2022.
- Estimating garment patterns from static scan data. In Computer Graphics Forum, volume 40, pages 273–287. Wiley Online Library, 2021.
- Physics-driven pattern adjustment for direct 3d garment editing. ACM Trans. Graph., 35(4):50–1, 2016.
- Parsing sewing patterns into 3d garments. Acm Transactions on Graphics (TOG), 32(4):1–12, 2013.
- Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2:3, 2023.
- Blender Foundation. Blender, 2022.
- Text2tex: Text-driven texture synthesis via diffusion models. arXiv preprint arXiv:2303.11396, 2023.
- Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023.
- Structure-preserving 3d garment modeling with neural sewing machines. Advances in Neural Information Processing Systems, 35:15147–15159, 2022.
- Garment modeling with a depth camera. ACM Transactions on Graphics (TOG), 34(6):1–12, 2015.
- Drapenet: Garment generation and self-supervised draping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1451–1460, 2023.
- Hyperdiffusion: Generating implicit neural fields with weight-space diffusion. arXiv preprint arXiv:2303.17015, 2023.
- Data-driven garment pattern estimation from 3d geometries. Eurographics 2021-Short Papers, 2021.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Tech: Text-guided reconstruction of lifelike clothed humans. arXiv preprint arXiv:2308.08545, 2023.
- Zero-shot text-guided object generation with dream fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 867–876, 2022.
- Garment capture from a photograph. Computer Animation and Virtual Worlds, 26(3-4):291–300, 2015.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Generating datasets of 3d garments with sewing patterns. In J. Vanschoren and S. Yeung, editors, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1, 2021.
- Neuraltailor: Reconstructing sewing pattern structures from 3d point clouds of garments. ACM Transactions on Graphics (TOG), 41(4):1–16, 2022.
- Isp: Multi-layered garment draping with implicit sewing patterns. arXiv preprint arXiv:2305.14100, 2023.
- Tada! text to animatable digital avatars. arXiv preprint arXiv:2308.10899, 2023.
- Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023.
- Towards garment sewing pattern reconstruction from a single image. ACM Transactions on Graphics (SIGGRAPH Asia), 2023.
- One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. arXiv preprint arXiv:2306.16928, 2023.
- Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9298–9309, 2023.
- Syncdreamer: Generating multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023.
- Wonder3d: Single image to 3d using cross-domain diffusion. arXiv preprint arXiv:2310.15008, 2023.
- Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
- Realfusion: 360deg reconstruction of any object from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8446–8455, 2023.
- Latent-nerf for shape-guided generation of 3d shapes and textures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12663–12673, 2023.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Extracting triangular 3d models, materials, and lighting from images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8280–8290, 2022.
- Polygen: An autoregressive generative model of 3d meshes. In International conference on machine learning, pages 7220–7229. PMLR, 2020.
- Computational pattern making from 3d garment models. ACM Transactions on Graphics (TOG), 41(4):1–14, 2022.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv, 2022.
- Personaltailor: Personalizing 2d pattern design from 3d garment point clouds. arXiv preprint arXiv:2303.09695, 2023.
- Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
- Richdreamer: A generalizable normal-depth diffusion model for detail richness in text-to-3d. arXiv preprint arXiv:2311.16918, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Dreambooth3d: Subject-driven text-to-3d generation. arXiv preprint arXiv:2303.13508, 2023.
- Texture: Text-guided texturing of 3d shapes. arXiv preprint arXiv:2302.01721, 2023.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
- Let 2d diffusion model know 3d-consistency for robust text-to-3d generation. arXiv preprint arXiv:2303.07937, 2023.
- Variational surface cutting. ACM Transactions on Graphics (TOG), 37(4):1–13, 2018.
- Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. Advances in Neural Information Processing Systems, 34:6087–6101, 2021.
- Gan-based garment generation using sewing pattern images. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pages 225–247. Springer, 2020.
- Zero123++: a single image to consistent multi-view diffusion base model. arXiv preprint arXiv:2310.15110, 2023.
- Mvdream: Multi-view diffusion for 3d generation. arXiv preprint arXiv:2308.16512, 2023.
- Meshgpt: Generating triangle meshes with decoder-only transformers. arXiv preprint arXiv:2311.15475, 2023.
- Mulaycap: Multi-layer human performance capture using a monocular video camera. IEEE Transactions on Visualization and Computer Graphics, 28(4):1862–1879, 2020.
- Deepcloth: Neural garment representation for shape and style editing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2):1581–1593, 2022.
- Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. arXiv preprint arXiv:2303.14184, 2023.
- Textmesh: Generation of realistic 3d meshes from text prompts. arXiv preprint arXiv:2304.12439, 2023.
- Sensitive couture for interactive garment modeling and editing. ACM Trans. Graph., 30(4):90, 2011.
- Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12619–12629, 2023.
- Learning a shared shape space for multimodal garment design. ACM Transactions on Graphics, 37(6):1–13, 2018.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023.
- Hyperdreamer: Hyper-realistic 3d content generation and editing from a single image. In SIGGRAPH Asia 2023 Conference Papers, pages 1–10, 2023.
- Neurallift-360: Lifting an in-the-wild 2d photo to a 3d object with 360deg views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4479–4489, 2023.
- Physics-inspired garment recovery from a single-view image. ACM Transactions on Graphics (TOG), 37(5):1–14, 2018.
- Consistent-1-to-3: Consistent image to 3d view synthesis via geometry-aware diffusion models. arXiv preprint arXiv:2310.03020, 2023.
- Surf-d: High-quality surface generation for arbitrary topologies using diffusion models. arXiv preprint arXiv:2311.17050, 2023.
- Dreamface: Progressive generation of animatable 3d faces under text guidance. arXiv preprint arXiv:2304.03117, 2023.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
- Efficientdreamer: High-fidelity and robust 3d creation via orthogonal-view diffusion prior. arXiv preprint arXiv:2308.13223, 2023.
- Groomgen: A high-quality generative hair model using hierarchical latent representations. ACM Transactions on Graphics (TOG), 42(6):1–16, 2023.
- Deep fashion3d: A dataset and benchmark for 3d garment reconstruction from single images. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 512–530. Springer, 2020.
- Kai He (38 papers)
- Kaixin Yao (7 papers)
- Qixuan Zhang (25 papers)
- Jingyi Yu (171 papers)
- Lingjie Liu (79 papers)
- Lan Xu (102 papers)