- The paper introduces a real-time framework that couples SUMO traffic simulation with photorealistic 4D digital cityscapes using Unreal Engine 5.
- It utilizes an asynchronous producer-consumer model with geospatial transformations and linear interpolation to ensure smooth, synchronized visualization.
- User studies demonstrate that spatial audio and dynamic visualization significantly enhance realism and perceived safety in urban planning scenarios.
Bridging Micro-scale Traffic Simulation and 4D Digital Cityscapes: Technical Synthesis and User-Centric Evaluation
Motivation and Context
Micro-scale traffic simulation platforms such as SUMO provide granular, agent-level traffic modeling essential for urban planning and transportation engineering. However, traditional interfaces remain abstract, hindering effective communication of simulation results to stakeholders and the public. The limitations of 2D visualization, as exemplified by SUMO's default interface, constrain experiential understanding of urban dynamics. The integration of photorealistic digital twins, enhanced with game engine technologies (Unreal Engine 5), addresses the gap by facilitating immersive representation of real environments. This work proposes a robust, real-time framework linking SUMO and UE5, focusing on spatial and auditory fidelity, supported by rigorous architecture and user study.
Figure 1: SUMO's default 2D visualization interface with traffic flow represented as moving shapes on a map.
System Architecture and Technical Implementation
The presented framework achieves architectural decoupling between simulation logic and visualization by leveraging an asynchronous producer-consumer model. The Bridge Actor mediates between SUMO's TraCI protocol and UE5's rendering loop, ingesting TCP packets with vehicle/traffic light state and conducting geospatial transformations to effect accurate placement on a reconstructed city map.
Vehicle visualization is enhanced via linear interpolation between discrete simulation steps, preventing temporal artifacts and enabling smooth dynamics regardless of mismatched simulation/rendering frequencies. Spatial fidelity is maintained by systematic coordinate transformation and height alignment through raycasting against the 3D terrain mesh. Traffic light states are mapped from SUMO's XML network definition, providing dynamic, junction-based synchronization in the VR environment.
Scalability is achieved through distance culling and object pooling, restricting actor instantiation to a configurable radius and maintaining separate pools for vehicle types. Scheduled rather than per-frame updates minimize computational load while preserving responsiveness. The integration of OSC for auralization supports asynchronous streaming of kinematic data to external engines, enabling multimodal feedback.
Figure 2: Unreal Engine 5-based 3D visualization of SUMO traffic simulation, over geospatial Zurich city reconstruction.
Figure 3: System architecture overview: Real-time mediation between SUMO simulation, geospatial transforms, and OSC auralization.
Experimental Design and User Study
The framework's experiential fidelity was evaluated via a controlled user study with twenty participants utilizing Google's photorealistic 3D city tiles and HTC VIVE VR headsets. Participants interacted with two distinct traffic scenarios (morning/slow, evening/fast), both with and without spatialized sound. Metrics included perceived safety, willingness to cross, realism, and immersion, assessed by a 22-item Likert questionnaire.
Quantitative results demonstrated that spatial audio significantly correlated with increased perceived safety and realism (mean safety with sound M=3.95, SD=1.10; realism M=4.00, SD=1.34). Scenario variance in traffic speed led to statistically significant differences in safety perception (p<0.03), confirming perceptual alignment with simulated dynamics. Participants expressed greater willingness to cross and found junction attractiveness enhanced with spatial sound. Qualitative feedback revealed intuitive grasp of traffic intensity and explicit discomfort with faster, riskier traffic scenarios.
Figure 4: User study results: Spatial audio increased perceived safety, willingness to cross, and attractiveness; scenario speed modulated safety assessments.
Practical and Theoretical Implications
The results demonstrate authoritative coupling between simulation logic and immersive rendering, substantiating the role of game engines in participatory urban planning. The influence of auralization underscores the necessity of multimodal simulation environments in communicating complex urban risk. The architecture's asynchronous design ensures real-time responsiveness and scalability, offering a template for broader integration of heterogeneous simulation domains (e.g., pedestrian, cyclist, multi-modal agent interactions).
Practically, the system enables urban planners, transportation engineers, and stakeholders to evaluate critical junctions and infrastructural interventions with unprecedented experiential fidelity. The framework's modular structure and OSC interface facilitate cross-domain extension (e.g., environmental acoustics, agent-based audio synthesis).
Future Directions
Current limitations include the uni-directional coupling of simulation and VR, constraining interactivity and agent-environment adaptation. Future research should implement bi-directional interaction, where VR users can directly influence vehicle behavior, unlocking applications in human-in-the-loop policy simulation. Upgrading the city model and VR hardware will enhance visual comfort and reduce simulator sickness. Additional quantitative evaluation of VR-specific discomfort and longitudinal studies on behavioral adaptation are warranted.
The system provides a foundation for scale expansion, multi-agent modeling, and real-time feedback loops between urban design stakeholders and simulation environments, supporting iterative, evidence-based urban optimization.
Conclusion
The coupling of microscale traffic simulation and 4D digital cityscapes via real-time architecture in UE5 and OSC delivers substantive advancements in immersive urban modeling and participatory planning. High numerical scores in user immersion, perceptual alignment, and safety perception—modulated by traffic dynamics and spatial audio—validate both the technical robustness and experiential efficacy of the framework. The confluence of scalable architecture and multimodal feedback establishes this approach as pivotal in bridging simulation logic and actionable insight within complex urban environments.