Generative AI for Urban Design: A Stepwise Approach Integrating Human Expertise with Multimodal Diffusion Models (2505.24260v1)

Published 30 May 2025 in cs.AI

Abstract: Urban design is a multifaceted process that demands careful consideration of site-specific constraints and collaboration among diverse professionals and stakeholders. The advent of generative artificial intelligence (GenAI) offers transformative potential by improving the efficiency of design generation and facilitating the communication of design ideas. However, most existing approaches are not well integrated with human design workflows. They often follow end-to-end pipelines with limited control, overlooking the iterative nature of real-world design. This study proposes a stepwise generative urban design framework that integrates multimodal diffusion models with human expertise to enable more adaptive and controllable design processes. Instead of generating design outcomes in a single end-to-end process, the framework divides the process into three key stages aligned with established urban design workflows: (1) road network and land use planning, (2) building layout planning, and (3) detailed planning and rendering. At each stage, multimodal diffusion models generate preliminary designs based on textual prompts and image-based constraints, which can then be reviewed and refined by human designers. We design an evaluation framework to assess the fidelity, compliance, and diversity of the generated designs. Experiments using data from Chicago and New York City demonstrate that our framework outperforms baseline models and end-to-end approaches across all three dimensions. This study underscores the benefits of multimodal diffusion models and stepwise generation in preserving human control and facilitating iterative refinements, laying the groundwork for human-AI interaction in urban design solutions.

Summary

Generative AI for Urban Design: Integrating Human Expertise and Multimodal Diffusion Models

This paper presents a methodological approach to advancing urban design through the integration of generative artificial intelligence (GenAI) with human expertise. Recognizing the inefficiencies and lack of dynamic control in current AI-driven, end-to-end urban design processes, the paper proposes a stepwise generative framework. The framework is designed to enhance collaboration between AI and human designers, facilitating more adaptable and iterative design workflows that better align with real-world urban design practices.

Urban design is inherently complex, involving various stakeholders, site-specific constraints, and iterative planning phases. Traditional AI approaches tend to overlook this complexity by adopting rigid, pipeline methodologies that limit adaptability and disregard intermediate design inputs. To overcome these limitations, the proposed framework incorporates a sequence of design stages: (1) road network and land use planning, (2) building layout planning, and (3) detailed planning and rendering. Each stage employs multimodal diffusion models, capable of integrating textual prompts and image-based constraints to generate preliminary designs. These designs can be subsequently reviewed and refined by human designers, allowing seamless human intervention and iterative refinements throughout the design process.

The paper successfully leverages the potential of ControlNet, a diffusion model that facilitates image synthesis guided by textual instructions and image constraints. This technique enables the generation of design diagrams that maintain visual fidelity and adhere closely to human inputs. The efficacy of ControlNet is assessed against baseline models, including Pix2Pix and a metric-enhanced Pix2Pix variant, across multiple dimensions: visual fidelity, instruction compliance, and design diversity.

Empirical applications on datasets from Chicago and New York City demonstrate that the proposed framework significantly outperforms conventional GAN-based models across these dimensions. The diffusion model not only generates images with higher fidelity, reflecting real-world urban forms, but also displays superior compliance with human-specified instructions regarding road density, land use composition, and building layout characteristics. This compliance underscores ControlNet's capacity to efficiently translate urban design intents into cohesive spatial layouts. Moreover, the ability to generate diverse design options under identical inputs highlights the model’s flexibility in exploring alternative design outcomes, essential for supporting urban planners in selecting and refining urban forms.

Both from a practical and theoretical perspective, the framework introduces an advanced paradigm for integrating AI into urban design. Practically, it offers enhanced tools for urban planners by improving the efficiency of generating and visualizing urban designs. By iteratively incorporating human expertise, the framework bridges the gap between automated design generation and the nuanced, context-specific decisions that underpin effective urban planning. Theoretically, this paper contributes to the expanding field of human-AI interaction in spatial design, offering a robust model for structuring urban design processes that are adaptable, interactive, and context-aware.

Future research directions could explore expanding the dataset scope to embrace diverse global urban forms beyond the typical grid layouts of U.S. cities. Additionally, incorporating broader contextual variables in diffusion modeling, such as socio-economic indicators and community stakeholder inputs, could provide richer context-aware guidance for urban design. As GenAI technology progresses, this paper sets a foundational premise for further exploration and refinement of urban design methodologies that effectively combine machine efficiency with human creativity.

YouTube

Show All Videos

Generative AI for Urban Design: A Stepwise Approach Integrating Human Expertise with Multimodal Diffusion Models (2505.24260v1)

Summary

Generative AI for Urban Design: Integrating Human Expertise and Multimodal Diffusion Models

Related Papers

YouTube