Generative AI for Urban Design: Integrating Human Expertise and Multimodal Diffusion Models
This paper presents a methodological approach to advancing urban design through the integration of generative artificial intelligence (GenAI) with human expertise. Recognizing the inefficiencies and lack of dynamic control in current AI-driven, end-to-end urban design processes, the paper proposes a stepwise generative framework. The framework is designed to enhance collaboration between AI and human designers, facilitating more adaptable and iterative design workflows that better align with real-world urban design practices.
Urban design is inherently complex, involving various stakeholders, site-specific constraints, and iterative planning phases. Traditional AI approaches tend to overlook this complexity by adopting rigid, pipeline methodologies that limit adaptability and disregard intermediate design inputs. To overcome these limitations, the proposed framework incorporates a sequence of design stages: (1) road network and land use planning, (2) building layout planning, and (3) detailed planning and rendering. Each stage employs multimodal diffusion models, capable of integrating textual prompts and image-based constraints to generate preliminary designs. These designs can be subsequently reviewed and refined by human designers, allowing seamless human intervention and iterative refinements throughout the design process.
The paper successfully leverages the potential of ControlNet, a diffusion model that facilitates image synthesis guided by textual instructions and image constraints. This technique enables the generation of design diagrams that maintain visual fidelity and adhere closely to human inputs. The efficacy of ControlNet is assessed against baseline models, including Pix2Pix and a metric-enhanced Pix2Pix variant, across multiple dimensions: visual fidelity, instruction compliance, and design diversity.
Empirical applications on datasets from Chicago and New York City demonstrate that the proposed framework significantly outperforms conventional GAN-based models across these dimensions. The diffusion model not only generates images with higher fidelity, reflecting real-world urban forms, but also displays superior compliance with human-specified instructions regarding road density, land use composition, and building layout characteristics. This compliance underscores ControlNet's capacity to efficiently translate urban design intents into cohesive spatial layouts. Moreover, the ability to generate diverse design options under identical inputs highlights the model’s flexibility in exploring alternative design outcomes, essential for supporting urban planners in selecting and refining urban forms.
Both from a practical and theoretical perspective, the framework introduces an advanced paradigm for integrating AI into urban design. Practically, it offers enhanced tools for urban planners by improving the efficiency of generating and visualizing urban designs. By iteratively incorporating human expertise, the framework bridges the gap between automated design generation and the nuanced, context-specific decisions that underpin effective urban planning. Theoretically, this paper contributes to the expanding field of human-AI interaction in spatial design, offering a robust model for structuring urban design processes that are adaptable, interactive, and context-aware.
Future research directions could explore expanding the dataset scope to embrace diverse global urban forms beyond the typical grid layouts of U.S. cities. Additionally, incorporating broader contextual variables in diffusion modeling, such as socio-economic indicators and community stakeholder inputs, could provide richer context-aware guidance for urban design. As GenAI technology progresses, this paper sets a foundational premise for further exploration and refinement of urban design methodologies that effectively combine machine efficiency with human creativity.