Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tell2Design: A Dataset for Language-Guided Floor Plan Generation (2311.15941v1)

Published 27 Nov 2023 in cs.CL and cs.CV

Abstract: We consider the task of generating designs directly from natural language descriptions, and consider floor plan generation as the initial research area. Language conditional generative models have recently been very successful in generating high-quality artistic images. However, designs must satisfy different constraints that are not present in generating artistic images, particularly spatial and relational constraints. We make multiple contributions to initiate research on this task. First, we introduce a novel dataset, \textit{Tell2Design} (T2D), which contains more than $80k$ floor plan designs associated with natural language instructions. Second, we propose a Sequence-to-Sequence model that can serve as a strong baseline for future research. Third, we benchmark this task with several text-conditional image generation models. We conclude by conducting human evaluations on the generated samples and providing an analysis of human performance. We hope our contributions will propel the research on language-guided design generation forward.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Cm3: A causal masked multimodal model of the internet. ArXiv.
  2. Large scale gan training for high fidelity natural image synthesis. ArXiv.
  3. Language models are few-shot learners. In Proc. of NIPS.
  4. Geometry aligned variational transformer for image-conditioned layout generation. In Proc. of ACM Multimedia.
  5. Stanislas Chaillou. 2020. Archigan: Artificial intelligence x architecture. In Architectural intelligence.
  6. Intelligent home 3d: Automatic 3d-house design from linguistic descriptions only. In Proc. of CVPR.
  7. Rico: A mobile app dataset for building data-driven design applications. In Proc. of UIST.
  8. Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. In Proc. of NIPS.
  9. Cogview: Mastering text-to-image generation via transformers. In Proc. of NIPS.
  10. Accurate, large minibatch sgd: Training imagenet in 1 hour. ArXiv.
  11. Denoising diffusion probabilistic models. In Proc. of NIPS.
  12. Cascaded diffusion models for high fidelity image generation. In Proc. of JMLR.
  13. The curious case of neural text degeneration. In Proc. of FAIR.
  14. Graph2plan: Learning floorplan generation from layout graphs. ACM Transactions on Graphics.
  15. Hao Hua. 2016. Irregular architectural layout synthesis with graphical inputs. Automation in Construction.
  16. A category-level 3d object dataset: Putting the kinect to work. In Consumer Depth Cameras for Computer Vision.
  17. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. ArXiv.
  18. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proc. of ACL.
  19. Object-driven text-to-image synthesis via adversarial training. In Proc. of CVPR.
  20. Raster-to-vector: Revisiting floorplan transformation. In Proc. of ICCV.
  21. Constraint-aware interior layout exploration for pre-cast concrete-based buildings. The Visual Computer.
  22. Generating images from captions with attention. ArXiv.
  23. Computer-generated residential building layouts. In Proc. of SIGGRAPH.
  24. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In Proc. of ICML.
  25. Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In Proc. of ICML.
  26. Automatic floor plan analysis and recognition. Automation in Construction.
  27. Language models are unsupervised multitask learners. OpenAI blog.
  28. Exploring the limits of transfer learning with a unified text-to-text transformer. In Proc. of JMLR.
  29. Hierarchical text-conditional image generation with clip latents. ArXiv.
  30. Zero-shot text-to-image generation. In Proc. of ICML.
  31. High-resolution image synthesis with latent diffusion models. In Proc. of CVPR.
  32. Palette: Image-to-image diffusion models. In Proc. of SIGGRAPH.
  33. Photorealistic text-to-image diffusion models with deep language understanding. ArXiv.
  34. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  35. Dalle-urban: Capturing the urban design expertise of large text to image transformers. ArXiv.
  36. Indoor segmentation and support inference from rgbd images. In Proc. of ECCV.
  37. Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proc. of CVPR.
  38. George Stiny. 1980. Introduction to shape and shape grammars. Environment and planning B: planning and design.
  39. Df-gan: Deep fusion generative adversarial networks for text-to-image synthesis. In Proc. of CVPR.
  40. Neural discrete representation learning. In Proc. of NIPS.
  41. Attention is all you need. In Proc. of NIPS.
  42. Hairclip: Design your hair by text and reference image. In Proc. of CVPR.
  43. Miqp-based layout design for building interiors. In Proc. of CGF.
  44. Data-driven interior plan generation for residential buildings. ACM Transactions on Graphics.
  45. Sun3d: A database of big spaces reconstructed using sfm and object labels. In Proc. of ICCV.
  46. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proc. of CVPR.
  47. Docred: A large-scale document-level relation extraction dataset. In Proc. of ACL.
  48. Improving text-to-image synthesis using contrastive learning. In Proc. of BMVC.
  49. Cross-modal contrastive learning for text-to-image generation. In Proc. of CVPR.
  50. Armani: Part-level garment-text alignment for unified cross-modal fashion design. In Proc. of ACM Multimedia.
  51. Publaynet: largest dataset ever for document layout analysis. In Proc. of ICDAR.
  52. Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In Proc. of CVPR.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Sicong Leng (15 papers)
  2. Yang Zhou (311 papers)
  3. Mohammed Haroon Dupty (12 papers)
  4. Wee Sun Lee (60 papers)
  5. Sam Conrad Joyce (4 papers)
  6. Wei Lu (325 papers)
Citations (4)

Summary

Analysis of "Tell2Design: A Dataset for Language-Guided Floor Plan Generation"

The paper "Tell2Design: A Dataset for Language-Guided Floor Plan Generation" addresses a significant advancement in the domain of design generation from natural language descriptions—a task that has not been extensively explored. This paper not only introduces a novel dataset, Tell2Design (T2D), but it also proposes a Sequence-to-Sequence (Seq2Seq) model that provides a robust baseline for future research in language-guided design.

Key Contributions

  1. Novel Dataset: The T2D dataset consists of over 80,000 floor plan designs paired with natural language instructions. This dataset fills a crucial gap in the area of design generation by enabling the exploration of design tasks driven directly by language inputs.
  2. Seq2Seq Baseline Model: The authors present a Sequence-to-Sequence model that interprets natural language instructions to generate floor plans, demonstrating a practical approach to this new form of design generation.
  3. Evaluation and Comparative Analysis: The paper benchmarks this task against several text-conditional image generation models including Obj-GAN, CogView, and Imagen. The evaluation highlights the strengths and limitations of these models in handling the precision required by floor plan generation.

Strong Numerical Results

The proposed T2D model achieves a macro IoU of 54.34, significantly outperforming other methods. This indicates the model's ability to align generated designs with the detailed requirements specified in the input text. The integration of boundary information into the Seq2Seq approach notably enhances model performance, showing its efficacy in addressing spatial constraints.

Implications and Future Directions

The research has crucial implications for both theoretical and practical advancements in AI-driven design tasks. By framing floor plan generation as a guided process based on natural language, this paper opens new pathways for interactive design systems. Potential future developments could include:

  • Improving Robustness: Enhancing robustness in understanding diverse and sometimes ambiguous human instructions remains a vital area for improvement.
  • Design Diversity: Incorporating mechanisms to diversify output designs, reflecting the inherent variability in possible solutions, could provide broader applicability.
  • Domain Extension: Exploring language-guided design in other domains such as document layouts or UI design could significantly broaden the impact of this research.

Overall, this paper sets a foundation for further exploration into language-guided design and its integration into practical AI applications, stressing the importance of accommodating the complexity and precision necessitated by real-world design constraints.

Github Logo Streamline Icon: https://streamlinehq.com