- The paper introduces a four-layer architecture integrating multi-modal sensing, edge analytics, V2X/P2X communication, and adaptive actuation for VRU safety.
- It leverages the R-LiVIT dataset to quantify risk with metrics like TTC and PET, demonstrating 49% VRU presence and significant night-time challenges.
- The study bridges surrogate-safety theory with real-time control, offering a scalable framework for intelligent intersections and proactive urban safety.
Integrated Roadside Sensing and Communication for VRU Safety at Signalized Intersections
Context and Motivation
Vulnerable Road Users (VRUs), including pedestrians, cyclists, and riders of powered two- and three-wheelers, experience disproportionately high injury and fatality rates at urban intersections. Vehicle-based sensing alone is insufficient due to occlusions, limited sensor coverage, and the inability to address risk proactively. Infrastructure-based approaches—leveraging a combination of LiDAR, radar, RGB and thermal cameras, and edge analytics—represent a promising direction, yet current deployments remain fragmented and lack integration with real-time communication and adaptive intersection control.
Paper Contributions
This work specifies a comprehensive four-layer architecture to address VRU safety at signalized intersections, unifying advances across perception, edge computation, V2X/P2X messaging, and adaptive actuation. The framework's empirical substantiation uses the R-LiVIT dataset [Mirlach et al., 2025], which provides the first large-scale LiDAR-Visual-Thermal roadside dataset annotated for VRUs across multiple real-world intersection geometries and lighting conditions. The analysis culminates in a risk-decision protocol derived from conflict-proxy metrics (TTC, PET), mapped to explicit control interventions.
Key claims and results include:
- VRUs constitute approximately 49% of all annotated road-users at studied intersections.
- Nighttime VRU densities drop by 38%, but close-proximity pair rates with vehicles increase, justifying thermal sensing and night-optimized analytics.
- Close-proximity events per frame vary by an order of magnitude across locations.
- 83% of pedestrian bounding boxes are small, indicating most VRUs are detected at a significant distance from cameras, underscoring the necessity for multi-modal, overlapping sensor placement.
- No fully integrated system has previously combined multi-modal sensing, edge analytics, bidirectional V2X/P2X, and adaptive actuation in a single deployable intersection cabinet.
Integrated Framework: Architecture and Rationale
Perception Layer
The framework mandates multi-sensor deployments—32-ch mechanical LiDAR, 77 GHz radar, RGB and thermal imaging—mounted on intersection infrastructure with overlapping fields of view. Each modality compensates for others' known limitations: RGB cameras are compromised at night or in adverse weather, radar misses low-RCS targets (VRUs), LiDAR has limited range in precipitation, and thermal lacks textural discrimination. Calibration and synchronization are addressed via state-of-the-art targetless methods for maintainability.
Edge Computation and Analytics
Edge cabinets execute detection (YOLO-based for camera feeds, voxel-based for LiDAR, clustering for radar), tracking, multi-modal fusion (feature-level for VRUs; decision-level for vehicles), trajectory prediction (Kalman filtering, learned predictors), and surrogate safety analysis (frame-level TTC and PET computation on all VRU-vehicle pairs). Edge over cloud is motivated by strict latency and bandwidth constraints, as well as privacy considerations on visual data.
Communication
Framework operationalization depends on robust C-V2X and DSRC interfaces for SDSM and PSM broadcast. In addition to direct vehicle communications, the system enables pedestrian-to-everything (P2X) interoperability for mobile devices. Prior pilots (e.g., THEA Tampa, Utah DSRC snowplows) have demonstrated the feasibility of these communication primitives.
Adaptive Actuation
The system modulates intersection phases (e.g., pedestrian green, vehicle red, warning beacons, and projector-based warnings) in real time based on graded TTC/PET thresholds. Control response is fully parameterized via conflict-to-crash transfer models (Hyden, 1987; Tarko, 2018), providing a principled bridge from near-miss analytics to actuation.
Smart-City Integration
Real-time analytics feed into longer-term safety diagnostics and urban informatics, including proactive hot-spot identification and causal inference for policy evaluation, with mention of scalable techniques such as causal forests and graph-based prediction.
Empirical Case Study: R-LiVIT Dataset
The R-LiVIT benchmark, comprising 53,319 multi-class annotations across 200 day/night sequences and three urban intersections, serves as the statistical foundation for the framework's design and validation. Salient findings include:
- VRUs are nearly half of all road users, with pedestrians accounting for 81% of VRUs.
- VRU and vehicle densities are bimodal with respect to lighting, but close-proximity VRU-vehicle events are disproportionately night-heavy.
- Close-proximity event frequency varies tenfold across connection points, supporting the notion of resource- and analytics-adaptive deployments.
- VRU detection is challenged by distant (small bounding-box) observations: 83.2% of pedestrians, 70.4% of bicycles, and 91.3% of e-scooters fall below a modest image area.
The case study's mixed-method analytics—distributional, location-specific, proximity-based—systematically inform the integrated system's design choices, unlike prior works which are limited to conceptual justification.
Theoretical and Practical Implications
This research bridges the established gap between piecemeal sensor deployments and proactive, intersection-wide VRU protection. It demonstrates, by quantitative analysis, that single-modality or non-adaptive deployments are insufficient to account for the spatial/temporal heterogeneity and co-presence of VRUs and vehicles. The proposed multi-modal and multi-stage edge analytics system, coupled with context-sensitive actuation, offers a scalable template for cities seeking to upgrade legacy infrastructure.
From a theoretical perspective, the adoption of conflict surrogate metrics as real-time control signals operationalizes decades of surrogate-safety theory and connects them to actuation for the first time in a unified edge-infrastructure system.
Limitations and Future Work
The paper is candid in its scope and boundaries. There is an explicit limitation regarding geographic generalization (the study uses only three German intersections) and reliance on image-plane proxies for true proximity. Operational aspects (procurement, long-term maintenance, and standardization hurdles for multi-modal edge deployments) are explicitly acknowledged as unresolved.
Future research avenues include:
- Deployment and benchmarking at operational intersections, including end-to-end latency and reliability measurement of SDSM and PSM messaging.
- Regional replication beyond the German sample, leveraging new traffic exposure datasets.
- Longitudinal causal effect estimation using instrumented deployments and policy impacts.
- Trajectory-level analytics with full object identification.
- Edge integration of LLM-driven scene reasoning to handle non-normative VRU behaviors.
Conclusion
This paper presents a comprehensive, empirically validated framework for adaptive roadside VRU safety at signalized intersections, integrating multi-modal sensing, edge analytics, V2X/P2X communication, and adaptive phase control. Quantitative analysis on the R-LiVIT dataset substantiates each architectural decision and highlights the criticality of multi-modal, context-sensitive, and real-time approaches. This system serves as a concrete foundation for next-generation intelligent intersection deployments, supporting both operational objectives and future research at the intersection of urban informatics, edge AI, and cooperative transport safety.