- The paper introduces a novel LLM-based framework that simulates urban behaviors, emphasizing adaptive memory and value-driven planning.
- It integrates multi-dimensional agent cognition modules to accurately reproduce realistic time use and mobility patterns in large-scale urban settings.
- Experimental results demonstrate that CitySim outperforms baselines in human-likeness and crowd density modeling, showcasing its potential for urban planning research.
CitySim: Large-Scale LLM-Driven Simulation of Urban Behaviors and City Dynamics
CitySim introduces a comprehensive framework for simulating urban environments using large-scale populations of LLM-driven agents. The system is designed to address the limitations of traditional agent-based models, which often rely on rigid, hand-crafted rules and lack the capacity to capture the diversity, adaptability, and long-term dynamics of real human behavior. CitySim leverages the reasoning, planning, and language capabilities of LLMs to generate agents with nuanced intentions, evolving preferences, and context-sensitive behaviors, enabling the paper and forecasting of complex urban phenomena.
Architecture and Agent Cognition
CitySim agents are instantiated with rich persona modules, incorporating demographic, psychographic, and habit-based attributes derived from real-world survey data. Each agent maintains:
- Temporal, Reflective, and Spatial Memory: Temporal memory logs chronologically ordered experiences; reflective memory synthesizes higher-level insights and attitudes; spatial memory encodes beliefs about points of interest (POIs) along multiple dimensions (e.g., price, atmosphere, satisfaction, convenience), updated via a Kalman filter and subject to decay.
- Belief and Needs Modules: Beliefs are updated after each POI visit using LLM-generated appraisals, while needs (hunger, energy, safety, social) are dynamically tracked and prioritized, with explicit thresholds triggering plan adaptation or interruption.
- Long-Term Goal Module: Agents periodically revise high-level aspirations, informed by Maslow’s hierarchy, financial status, social connectivity, and recent experiences, with LLMs generating structured short- and long-term goals.
- Recursive Value-Driven Planning: Daily schedules are constructed by recursively filling time blocks, starting with mandatory activities and adaptively inserting medium- and low-priority tasks based on current needs, goals, and situational context. Value-driven planning is realized through LLM calls that generate and evaluate candidate activities for each leisure block.
Mobility and social behaviors are further grounded in a belief-weighted gravity model for place selection, LLM-based vehicle choice, and a dynamic social network where relationship strengths evolve through both face-to-face and online interactions.
Experimental Evaluation
CitySim is evaluated in the Tokyo metropolitan area with up to 1,000 agents, using GPT-4o-mini as the primary LLM. The framework is benchmarked against GeAn, AGA, HumanoidAgent, MobileCity, and AgentSociety across several dimensions:
- Macro-Level Time Use: Simulated time-use distributions closely match ground-truth data from the Japanese national time use survey, with activity shares by age group aligning with real-world statistics.
- Behavioral Realism: In pairwise human-likeness evaluations (naturalness, coherence, plausibility), CitySim achieves the highest average win rates, outperforming all baselines. The explicit modeling of needs, dynamic goals, and memory-based planning is identified as critical for producing adaptive, context-sensitive behavior.
- Mobility Patterns: CitySim accurately reproduces real-world travel distributions, capturing both the timing and amplitude of commuting peaks and weekend leisure activity. Competing models exhibit either rigid or diffused travel peaks.
- POI Popularity Prediction: CitySim demonstrates positive Spearman rank correlation between simulated and real-world POI popularity in Shibuya, outperforming SocietyAgent. However, a positive bias toward well-known POIs is observed, reflecting LLM popularity bias.
- Well-Being Estimation: In predicting well-being classes from simulated survey responses, CitySim achieves a macro F1-score of 0.36, second only to a GBDT baseline trained on real data, and surpasses all agent-based baselines.
- Crowd Density Modeling: Simulated crowd density heatmaps in Shibuya closely match those derived from smartphone location data, with high densities around transit nodes and commercial streets. Underestimation in small streets is attributed to the gravity model’s emphasis on prominent POIs.
Ablation studies confirm the necessity of each architectural module: removing belief, needs, or persona modules leads to substantial drops in human-likeness scores across activity, dialogue, mobility, and event reaction domains.
CitySim demonstrates efficient scaling, with mean simulation step times increasing modestly as agent populations grow from 103 to 106. This supports the feasibility of large-scale urban simulations with minimal computational overhead.
Limitations
The authors acknowledge several limitations:
- Reproducibility: Some experimental data are proprietary and not publicly available.
- Bias and Hallucination: LLM-driven agents may inherit cultural, gender, and socioeconomic biases, and can hallucinate appraisals for less-known POIs.
- Black-Box Reasoning: The internal logic of LLMs remains opaque, complicating the interpretation of emergent agent behaviors.
- Evaluation Methodology: Reliance on LLM-as-judge evaluations may introduce stylistic bias, and human assessments are needed for more robust validation.
- Contextual Abstraction: Certain real-world factors (e.g., weather, micro-scale attractors, self-actualization needs) are not fully modeled.
Implications and Future Directions
CitySim establishes a scalable, flexible testbed for simulating and analyzing urban dynamics, with direct applications in urban planning, retail strategy, public safety, and social science research. The integration of LLMs enables agents to exhibit lifelike, adaptive, and context-aware behaviors at both micro and macro levels, supporting what-if analyses and policy evaluation in silico.
The framework’s modular design facilitates extension to additional domains (e.g., health, food environments) and the incorporation of richer contextual features (e.g., land use, infrastructure, historical mobility traces). Addressing the limitations of LLM-driven simulation—particularly with respect to bias, explainability, and evaluation—remains a critical area for future research. The development of more transparent, interpretable agent architectures and the integration of human-in-the-loop validation will be essential for the responsible deployment of synthetic societies in real-world decision-making contexts.
CitySim represents a significant advance in the use of LLM-powered agents for urban simulation, providing a robust foundation for both academic research and industry applications at the intersection of AI, behavioral modeling, and urban systems.