Exploring Text-based Realistic Building Facades Editing Applicaiton (2405.02967v1)
Abstract: This paper explores the utilization of diffusion models and textual guidance for achieving localized editing of building facades, addressing the escalating demand for sophisticated editing methodologies in architectural design and urban planning. Leveraging the robust generative capabilities of diffusion models, this study presents a promising avenue for realistically synthesizing and modifying architectural facades. Through iterative diffusion and text descriptions, these models adeptly capture both the intricate global and local structures inherent in architectural facades, thus effectively navigating the complexity of such designs. Additionally, the paper examines the expansive potential of diffusion models in various facets, including the generation of novel facade designs, the enhancement of existing facades, and the realization of personalized customization. Despite their promise, diffusion models encounter obstacles such as computational resource constraints and data imbalances. To address these challenges, the study introduces the innovative Blended Latent Diffusion method for architectural facade editing, accompanied by a comprehensive visual analysis of its viability and efficacy. Through these endeavors, we aims to propel forward the field of architectural facade editing, contributing to its advancement and practical application.
- J. W. Rae, S. Borgeaud, T. Cai, K. Millican, J. Hoffmann, F. Song, J. Aslanides, S. Henderson, R. Ring, S. Young et al., “Scaling language models: Methods, analysis & insights from training gopher,” arXiv preprint arXiv:2112.11446, 2021.
- P. Li, Y. Ding, L. Li, J. Guan, and Z. Li, “Towards practical consistent video depth estimation,” in Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, 2023, pp. 388–397.
- R. Thoppilan, D. De Freitas, J. Hall, N. Shazeer, A. Kulshreshtha, H.-T. Cheng, A. Jin, T. Bos, L. Baker, Y. Du et al., “LaMDA: Language models for dialog applications,” arXiv preprint arXiv:2201.08239, 2022.
- T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020, pp. 1877–1901.
- R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961–2969.
- D. Garlan, R. Allen, and J. Ockerbloom, “Exploiting style in architectural design environments,” ACM SIGSOFT software engineering notes, vol. 19, no. 5, pp. 175–188, 1994.
- O. O. Demirbaş and H. Demirkan, “Focus on architectural design process through learning styles,” Design studies, vol. 24, no. 5, pp. 437–456, 2003.
- I. Caetano, L. Santos, and A. Leitão, “Computational design in architecture: Defining parametric, generative, and algorithmic design,” Frontiers of Architectural Research, vol. 9, no. 2, pp. 287–300, 2020.
- D. Aliakseyeu, J.-B. Martens, and M. Rauterberg, “A computer support tool for the early stages of architectural design,” Interacting with Computers, vol. 18, no. 4, pp. 528–555, 2006.
- Ö. Akin and C. Akin, “Frames of reference in architectural design: analysing the hyperacclamation (aha-!),” Design studies, vol. 17, no. 4, pp. 341–361, 1996.
- K. J. Lomas, “Architectural design of an advanced naturally ventilated building form,” Energy and Buildings, vol. 39, no. 2, pp. 166–181, 2007.
- M. Thalfeldt, E. Pikas, J. Kurnitski, and H. Voll, “Facade design principles for nearly zero energy buildings in a cold climate,” Energy and Buildings, vol. 67, pp. 309–321, 2013.
- Y.-W. Lim, M. Z. Kandar, M. H. Ahmad, D. R. Ossen, and A. M. Abdullah, “Building façade design for daylighting quality in typical government office building,” Building and Environment, vol. 57, pp. 194–204, 2012.
- P. Li, Q. Huang, Y. Ding, and Z. Li, “Layerdiffusion: Layered controlled image editing with diffusion models,” in SIGGRAPH Asia 2023 Technical Communications, 2023, pp. 1–4.
- T. Brooks, A. Holynski, and A. A. Efros, “Instructpix2pix: Learning to follow image editing instructions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 392–18 402.
- G. Parmar, K. Kumar Singh, R. Zhang, Y. Li, J. Lu, and J.-Y. Zhu, “Zero-shot image-to-image translation,” in ACM SIGGRAPH 2023 Conference Proceedings, 2023, pp. 1–11.
- R. Mokady, A. Hertz, K. Aberman, Y. Pritch, and D. Cohen-Or, “Null-text inversion for editing real images using guided diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6038–6047.
- H. Chang, H. Zhang, J. Barber, A. Maschinot, J. Lezama, L. Jiang, M.-H. Yang, K. Murphy, W. T. Freeman, M. Rubinstein et al., “Muse: Text-to-image generation via masked generative transformers,” arXiv preprint arXiv:2301.00704, 2023.
- N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman, “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 500–22 510.
- H. Chefer, Y. Alaluf, Y. Vinker, L. Wolf, and D. Cohen-Or, “Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models,” ACM Transactions on Graphics (TOG), vol. 42, no. 4, pp. 1–10, 2023.
- N. Kumari, B. Zhang, R. Zhang, E. Shechtman, and J.-Y. Zhu, “Multi-concept customization of text-to-image diffusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1931–1941.
- T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
- A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, “Generative adversarial networks: An overview,” IEEE signal processing magazine, vol. 35, no. 1, pp. 53–65, 2018.
- M.-Y. Liu and O. Tuzel, “Coupled generative adversarial networks,” Advances in neural information processing systems, vol. 29, 2016.
- G. Shalunts, Y. Haxhimusa, and R. Sablatnig, “Architectural style classification of building facade windows,” in International Symposium on Visual Computing. Springer, 2011, pp. 280–289.
- S. Moghtadernejad, M. S. Mirza, and L. E. Chouinard, “Facade design stages: Issues and considerations,” Journal of Architectural Engineering, vol. 25, no. 1, p. 04018033, 2019.
- S. M. Hosseini, M. Mohammadi, A. Rosemann, T. Schröder, and J. Lichtenberg, “A morphological approach for kinetic façade design process to improve visual and thermal comfort,” Building and environment, vol. 153, pp. 186–204, 2019.
- V. Ž. Leskovar and M. Premrov, “An approach in architectural design of energy-efficient timber buildings with a focus on the optimal glazing size in the south-oriented façade,” Energy and buildings, vol. 43, no. 12, pp. 3410–3418, 2011.
- P. Li, B. Li, and Z. Li, “Sketch-to-architecture: Generative ai-aided architectural design,” in Proceedings of the 31st Pacific Conference on Computer Graphics and Applications. The Eurographics Association, 2023.
- P. Li and B. Li, “Generating daylight-driven architectural design via diffusion models,” arXiv preprint arXiv:2404.13353, 2024.
- O. Avrahami, D. Lischinski, and O. Fried, “Blended diffusion for text-driven editing of natural images,” 2022, pp. 18 208–18 218.
- O. Avrahami, O. Fried, and D. Lischinski, “Blended latent diffusion,” ACM Transactions on Graphics (TOG), vol. 42, no. 4, pp. 1–11, 2023.