- The paper introduces an agent-based reinforcement learning approach using Deep Q-Learning to procedurally generate urban land use maps optimized by a reward function incorporating Land Use and Transport Interaction (LUTI) principles.
- A key technical aspect is integrating grid-based land use with graph-based road networks to accurately calculate accessibility, which is crucial for the agent's decisions and the LUTI-derived utility scores.
- The framework simulates realistic urban phenomena like zoning and accessibility dynamics through the agent's learned policy, balancing local land use compatibility with transportation network influence.
Agent-Based Procedural City Generation with LUTI Integration
The paper "An agent-based approach to procedural city generation incorporating Land Use and Transport Interaction models" (2211.01959) presents a methodology for generating artificial city layouts by assigning land uses to discrete plots within a predefined road network. The approach utilizes a single reinforcement learning agent, trained using Deep Q-Learning (DQN), to sequentially select land use types (residential, commercial, industrial, recreational) for undeveloped cells. A core aspect of this work is the incorporation of principles derived from Land Use and Transport Interaction (LUTI) models directly into the agent's reward function, aiming to produce more realistic urban structures compared to purely geometric or rule-based procedural generation methods.
Agent Framework and LUTI-Based Reward Structure
The system operates on a grid representing potential land plots, overlaid with an immutable road network imported from sources like OpenStreetMap or SUMO. A single "builder" agent selects undeveloped grid cells, prioritized by their accessibility, and assigns one of the four available land use types. The agent's state is defined by a tensor capturing the land use types and normalized accessibility values of cells within a local perception radius K.
The integration of LUTI principles is primarily achieved through the reward mechanism. The global reward, maximized by the DQN agent, is the sum of individual scores for all developed cells. The score for a cell depends on its assigned land use and its context, defined by LUTI-inspired formulae:
- Residential Score (shâ): Positively influenced by the cell's accessibility (acâ) and the number of nearby residential (Nhâ), commercial (Ncâ), and recreational (Nrâ) cells, while being negatively impacted by nearby industrial cells (Niâ) within radius
K.
shâ=waâacâ+whhâNhâ+whcâNcââwhiâNiâ+whrâNrâ
- Commercial Score (scâ): Positively influenced by accessibility (acâ) and nearby residential cells (Nhâ). It exhibits a non-linear relationship with nearby commercial cells (Ncâ) to model market saturation effects (initially positive, then potentially negative) and is negatively influenced by nearby industrial cells (Niâ).
scâ=waâacâ+wchâNhâ+wccâNcââwcc2âNc2ââwciâNiâ
- Industrial Score (siâ): Favors peripheral locations, using the distance to the nearest network node (dNâ(c)) rather than overall accessibility. It is negatively influenced by nearby residential (Nhâ), commercial (Ncâ), and recreational (Nrâ) cells.
siâ=widâdNâ(c)âwihâNhââwicâNcââwirâNrâ
- Recreational Score (srâ): Primarily influenced positively by nearby residential cells (Nhâ).
srâ=wrhâNhâ
The weights (w) in these functions are hyperparameters that encode the relative importance of accessibility and inter-land use adjacencies, effectively defining the zoning rules and utility perceptions the agent learns to optimize. The agent learns a policy Ï(s) via DQN with experience replay to select the action (land use type) that maximizes the expected cumulative reward, leading to a final land use map.
Integrated Representation of Road Network and Land Use Grid
A significant contribution is the method used to integrate the graph-based road network and the grid-based land use representation for calculating accessibility. The road network is a directed graph G=(V,E) where edge weights represent travel times. The land use map is a grid C.
- Cell-to-Edge Mapping: Each grid cell câC is associated with its geometrically nearest edge ecâE. Let pcâ be the point on ec closest to the center of cell c.
- Inter-Cell Travel Time: The travel time t(ciâ,cjâ) between two cells ciâ and cjâ is computed by summing the travel time from ciâ to pciââ (Euclidean distance scaled by a speed factor), the time along edge eciâ from pciââ to the edge's target node etciââ, the shortest path time on the graph G between etciââ and the source node of ecjâ, escjââ (using Dijkstra's algorithm), the time along edge ecjâ from escjââ to pcjââ, and finally the time from pcjââ to cjâ.
- Accessibility Metric: The accessibility T(c) of a cell c is the inverse of the average travel time to all other cells ciââC: T(c)=(âŁCâŁ1âciââCâât(c,ciâ))â1. This value is normalized to [0,1] to get acâ used in the reward functions.
- Computational Optimization: Calculating all-pairs shortest paths for T(c) is computationally expensive (O(âŁCâŁ2ĂDijkstra)). The paper proposes partitioning cells based on their nearest edge (Ceâ={câCâŁec=e}). By pre-calculating sums of travel times within these partitions (Teâ=âciââCeâât(c,ciâ)) and utilizing shortest paths between edge nodes, the calculation is accelerated, although the fundamental complexity remains quadratic in the number of cells. This optimization makes the approach feasible for moderately sized maps (tested up to 4096 cells or 16 kmÂČ).
Simulation of Zoning and Accessibility Dynamics
The framework simulates key urban phenomena through the agent's optimization process:
- Zoning: Explicit zoning rules are encoded in the reward function's interaction terms (e.g., attraction between residential and commercial/recreational, repulsion between residential and industrial). The agent learns to arrange land uses spatially to maximize aggregate scores based on these local neighborhood interactions, leading to emergent zoning patterns.
- Accessibility: Accessibility, derived from the road network topology and travel times via the integrated representation, plays a crucial role. It directly influences the utility scores for residential and commercial uses, promoting their concentration in well-connected areas. It also dictates the order of development, with the agent prioritizing more accessible cells, mimicking a common pattern in urban growth. Industrial placement is inversely related to centrality via the dNâ(c) term.
The resulting city layouts reflect a balance between maximizing local land use compatibility (zoning) and leveraging the transportation network (accessibility), consistent with principles observed in LUTI models. The agent acts as a centralized planner optimizing a specific utility function defined by the reward structure.
Methodological Considerations and Limitations
Strengths:
- Grounds procedural generation in established urban modeling concepts (LUTI).
- Provides a functional mechanism for integrating distinct spatial representations (graph network, grid land use) for accessibility calculation.
- Explicitly models accessibility and land use interactions via the reward function.
- Leverages reinforcement learning to potentially find non-trivial optimal land use allocations.
- Can utilize real-world road network data, enhancing the plausibility of the generated context.
Limitations:
- Scalability: The accessibility calculation remains a bottleneck, limiting practical application to moderate city sizes due to its O(âŁCâŁ2) complexity, even with optimizations.
- Static Network: The road network is fixed and predefined; the system does not model the co-evolution of transportation infrastructure and land use.
- Simplified LUTI: The reward functions are specific mathematical forms representing a simplification of complex socio-economic interactions and utility preferences. Parameter tuning (w) is crucial and potentially subjective.
- Single-Agent Paradigm: Real urban development results from decentralized decisions by numerous heterogeneous agents (households, firms, developers). This model uses a single, centralized agent optimizing a global function.
- Evaluation: Assessment relies heavily on the internal reward score and visual inspection, lacking rigorous comparison against quantitative metrics of real-world urban form or alternative generative models.
- Path Dependency: The final configuration is highly sensitive to the initial road network structure and the sequential nature of the agent's decisions.
Potential Applications
Despite limitations, the approach offers potential in several areas:
- Urban Planning Support: As a rapid prototyping tool for exploring land use scenarios on given networks and evaluating them based on LUTI-derived metrics.
- Virtual Environment Generation: Creating functionally zoned and more plausible urban environments for simulations, games, and digital twins.
- Educational Tools: Demonstrating the interplay of transport infrastructure, accessibility, and land use zoning in urban systems.
- Research Platform: Serving as a foundation for extensions incorporating dynamic networks, multiple agent types, more sophisticated economic models, or alternative learning paradigms.
Conclusion
This research presents a valuable step towards integrating established urban theories (LUTI models) into procedural city generation using an agent-based reinforcement learning framework. By explicitly modeling accessibility derived from a road network graph and encoding zoning preferences into the agent's reward function, the system generates land use patterns that reflect fundamental urban spatial organization principles. While scalability and the static nature of the transport network remain key limitations, the methodology provides a novel approach for creating more functionally realistic artificial cities.