Tell2Design: A Dataset for Language-Guided Floor Plan Generation (2311.15941v1)
Abstract: We consider the task of generating designs directly from natural language descriptions, and consider floor plan generation as the initial research area. Language conditional generative models have recently been very successful in generating high-quality artistic images. However, designs must satisfy different constraints that are not present in generating artistic images, particularly spatial and relational constraints. We make multiple contributions to initiate research on this task. First, we introduce a novel dataset, \textit{Tell2Design} (T2D), which contains more than $80k$ floor plan designs associated with natural language instructions. Second, we propose a Sequence-to-Sequence model that can serve as a strong baseline for future research. Third, we benchmark this task with several text-conditional image generation models. We conclude by conducting human evaluations on the generated samples and providing an analysis of human performance. We hope our contributions will propel the research on language-guided design generation forward.
- Cm3: A causal masked multimodal model of the internet. ArXiv.
- Large scale gan training for high fidelity natural image synthesis. ArXiv.
- Language models are few-shot learners. In Proc. of NIPS.
- Geometry aligned variational transformer for image-conditioned layout generation. In Proc. of ACM Multimedia.
- Stanislas Chaillou. 2020. Archigan: Artificial intelligence x architecture. In Architectural intelligence.
- Intelligent home 3d: Automatic 3d-house design from linguistic descriptions only. In Proc. of CVPR.
- Rico: A mobile app dataset for building data-driven design applications. In Proc. of UIST.
- Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. In Proc. of NIPS.
- Cogview: Mastering text-to-image generation via transformers. In Proc. of NIPS.
- Accurate, large minibatch sgd: Training imagenet in 1 hour. ArXiv.
- Denoising diffusion probabilistic models. In Proc. of NIPS.
- Cascaded diffusion models for high fidelity image generation. In Proc. of JMLR.
- The curious case of neural text degeneration. In Proc. of FAIR.
- Graph2plan: Learning floorplan generation from layout graphs. ACM Transactions on Graphics.
- Hao Hua. 2016. Irregular architectural layout synthesis with graphical inputs. Automation in Construction.
- A category-level 3d object dataset: Putting the kinect to work. In Consumer Depth Cameras for Computer Vision.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. ArXiv.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proc. of ACL.
- Object-driven text-to-image synthesis via adversarial training. In Proc. of CVPR.
- Raster-to-vector: Revisiting floorplan transformation. In Proc. of ICCV.
- Constraint-aware interior layout exploration for pre-cast concrete-based buildings. The Visual Computer.
- Generating images from captions with attention. ArXiv.
- Computer-generated residential building layouts. In Proc. of SIGGRAPH.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In Proc. of ICML.
- Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In Proc. of ICML.
- Automatic floor plan analysis and recognition. Automation in Construction.
- Language models are unsupervised multitask learners. OpenAI blog.
- Exploring the limits of transfer learning with a unified text-to-text transformer. In Proc. of JMLR.
- Hierarchical text-conditional image generation with clip latents. ArXiv.
- Zero-shot text-to-image generation. In Proc. of ICML.
- High-resolution image synthesis with latent diffusion models. In Proc. of CVPR.
- Palette: Image-to-image diffusion models. In Proc. of SIGGRAPH.
- Photorealistic text-to-image diffusion models with deep language understanding. ArXiv.
- Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- Dalle-urban: Capturing the urban design expertise of large text to image transformers. ArXiv.
- Indoor segmentation and support inference from rgbd images. In Proc. of ECCV.
- Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proc. of CVPR.
- George Stiny. 1980. Introduction to shape and shape grammars. Environment and planning B: planning and design.
- Df-gan: Deep fusion generative adversarial networks for text-to-image synthesis. In Proc. of CVPR.
- Neural discrete representation learning. In Proc. of NIPS.
- Attention is all you need. In Proc. of NIPS.
- Hairclip: Design your hair by text and reference image. In Proc. of CVPR.
- Miqp-based layout design for building interiors. In Proc. of CGF.
- Data-driven interior plan generation for residential buildings. ACM Transactions on Graphics.
- Sun3d: A database of big spaces reconstructed using sfm and object labels. In Proc. of ICCV.
- Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proc. of CVPR.
- Docred: A large-scale document-level relation extraction dataset. In Proc. of ACL.
- Improving text-to-image synthesis using contrastive learning. In Proc. of BMVC.
- Cross-modal contrastive learning for text-to-image generation. In Proc. of CVPR.
- Armani: Part-level garment-text alignment for unified cross-modal fashion design. In Proc. of ACM Multimedia.
- Publaynet: largest dataset ever for document layout analysis. In Proc. of ICDAR.
- Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In Proc. of CVPR.
- Sicong Leng (15 papers)
- Yang Zhou (311 papers)
- Mohammed Haroon Dupty (12 papers)
- Wee Sun Lee (60 papers)
- Sam Conrad Joyce (4 papers)
- Wei Lu (325 papers)