An Integrated Roadside Sensing and Communication Framework for Vulnerable Road User Safety at Signalized Intersections

Published 5 Jun 2026 in stat.AP and cs.CV | (2606.07016v1)

Abstract: Vulnerable road users (VRUs) account for approximately half of urban traffic deaths globally, with intersections concentrating a disproportionate share of these casualties. Recent reviews of sensing technology for VRU protection have cataloged dozens of single-sensor and dual-sensor deployments, yet none of the surveyed systems couples multi-modal sensing with edge-side near-miss analytics and bidirectional vehicle-to-everything (V2X) and pedestrian-to-everything (P2X) messaging in a single intersection cabinet. This paper presents an integrated framework for VRU protection at signalized intersections, combining LiDAR, radar, RGB camera, and thermal camera at the perception layer, edge-based prediction and surrogate-safety analytics at the computation layer, V2X and P2X messaging at the communication layer, and adaptive signal control at the actuation layer. The framework is grounded in an empirical case study using R-LiViT, the first publicly released roadside LiDAR-Visual-Thermal dataset, which provides 200 multi-modal sequences and 2,400 annotated RGB-T frames at three German intersections. Analysis of 53,319 detection annotations reveals that VRUs comprise approximately 49% of all road-user observations, that day-to-night density drops by 38% for pedestrians and 45% for vehicles while the night distribution shows a higher close-proximity share, that per-frame close-proximity event counts vary approximately 10-fold across the eight unique locations at three intersections, and that 83% of pedestrian bounding boxes are small in image space, indicating that VRUs are typically far from any single sensor. These findings support multi-modal sensing, edge-side analytics, and adaptive context-sensitive deployment rather than uniform single-sensor solutions.

Abstract PDF Upgrade to Chat

Authors (1)

Parvez Anowar

Summary

The paper introduces a four-layer architecture integrating multi-modal sensing, edge analytics, V2X/P2X communication, and adaptive actuation for VRU safety.
It leverages the R-LiVIT dataset to quantify risk with metrics like TTC and PET, demonstrating 49% VRU presence and significant night-time challenges.
The study bridges surrogate-safety theory with real-time control, offering a scalable framework for intelligent intersections and proactive urban safety.

Integrated Roadside Sensing and Communication for VRU Safety at Signalized Intersections

Context and Motivation

Vulnerable Road Users (VRUs), including pedestrians, cyclists, and riders of powered two- and three-wheelers, experience disproportionately high injury and fatality rates at urban intersections. Vehicle-based sensing alone is insufficient due to occlusions, limited sensor coverage, and the inability to address risk proactively. Infrastructure-based approaches—leveraging a combination of LiDAR, radar, RGB and thermal cameras, and edge analytics—represent a promising direction, yet current deployments remain fragmented and lack integration with real-time communication and adaptive intersection control.

Paper Contributions

This work specifies a comprehensive four-layer architecture to address VRU safety at signalized intersections, unifying advances across perception, edge computation, V2X/P2X messaging, and adaptive actuation. The framework's empirical substantiation uses the R-LiVIT dataset [Mirlach et al., 2025], which provides the first large-scale LiDAR-Visual-Thermal roadside dataset annotated for VRUs across multiple real-world intersection geometries and lighting conditions. The analysis culminates in a risk-decision protocol derived from conflict-proxy metrics (TTC, PET), mapped to explicit control interventions.

Key claims and results include:

VRUs constitute approximately 49% of all annotated road-users at studied intersections.
Nighttime VRU densities drop by 38%, but close-proximity pair rates with vehicles increase, justifying thermal sensing and night-optimized analytics.
Close-proximity events per frame vary by an order of magnitude across locations.
83% of pedestrian bounding boxes are small, indicating most VRUs are detected at a significant distance from cameras, underscoring the necessity for multi-modal, overlapping sensor placement.
No fully integrated system has previously combined multi-modal sensing, edge analytics, bidirectional V2X/P2X, and adaptive actuation in a single deployable intersection cabinet.

Integrated Framework: Architecture and Rationale

Perception Layer

The framework mandates multi-sensor deployments—32-ch mechanical LiDAR, 77 GHz radar, RGB and thermal imaging—mounted on intersection infrastructure with overlapping fields of view. Each modality compensates for others' known limitations: RGB cameras are compromised at night or in adverse weather, radar misses low-RCS targets (VRUs), LiDAR has limited range in precipitation, and thermal lacks textural discrimination. Calibration and synchronization are addressed via state-of-the-art targetless methods for maintainability.

Edge Computation and Analytics

Edge cabinets execute detection (YOLO-based for camera feeds, voxel-based for LiDAR, clustering for radar), tracking, multi-modal fusion (feature-level for VRUs; decision-level for vehicles), trajectory prediction (Kalman filtering, learned predictors), and surrogate safety analysis (frame-level TTC and PET computation on all VRU-vehicle pairs). Edge over cloud is motivated by strict latency and bandwidth constraints, as well as privacy considerations on visual data.

Communication

Framework operationalization depends on robust C-V2X and DSRC interfaces for SDSM and PSM broadcast. In addition to direct vehicle communications, the system enables pedestrian-to-everything (P2X) interoperability for mobile devices. Prior pilots (e.g., THEA Tampa, Utah DSRC snowplows) have demonstrated the feasibility of these communication primitives.

Adaptive Actuation

The system modulates intersection phases (e.g., pedestrian green, vehicle red, warning beacons, and projector-based warnings) in real time based on graded TTC/PET thresholds. Control response is fully parameterized via conflict-to-crash transfer models (Hyden, 1987; Tarko, 2018), providing a principled bridge from near-miss analytics to actuation.

Smart-City Integration

Real-time analytics feed into longer-term safety diagnostics and urban informatics, including proactive hot-spot identification and causal inference for policy evaluation, with mention of scalable techniques such as causal forests and graph-based prediction.

Empirical Case Study: R-LiVIT Dataset

The R-LiVIT benchmark, comprising 53,319 multi-class annotations across 200 day/night sequences and three urban intersections, serves as the statistical foundation for the framework's design and validation. Salient findings include:

VRUs are nearly half of all road users, with pedestrians accounting for 81% of VRUs.
VRU and vehicle densities are bimodal with respect to lighting, but close-proximity VRU-vehicle events are disproportionately night-heavy.
Close-proximity event frequency varies tenfold across connection points, supporting the notion of resource- and analytics-adaptive deployments.
VRU detection is challenged by distant (small bounding-box) observations: 83.2% of pedestrians, 70.4% of bicycles, and 91.3% of e-scooters fall below a modest image area.

The case study's mixed-method analytics—distributional, location-specific, proximity-based—systematically inform the integrated system's design choices, unlike prior works which are limited to conceptual justification.

Theoretical and Practical Implications

This research bridges the established gap between piecemeal sensor deployments and proactive, intersection-wide VRU protection. It demonstrates, by quantitative analysis, that single-modality or non-adaptive deployments are insufficient to account for the spatial/temporal heterogeneity and co-presence of VRUs and vehicles. The proposed multi-modal and multi-stage edge analytics system, coupled with context-sensitive actuation, offers a scalable template for cities seeking to upgrade legacy infrastructure.

From a theoretical perspective, the adoption of conflict surrogate metrics as real-time control signals operationalizes decades of surrogate-safety theory and connects them to actuation for the first time in a unified edge-infrastructure system.

Limitations and Future Work

The paper is candid in its scope and boundaries. There is an explicit limitation regarding geographic generalization (the study uses only three German intersections) and reliance on image-plane proxies for true proximity. Operational aspects (procurement, long-term maintenance, and standardization hurdles for multi-modal edge deployments) are explicitly acknowledged as unresolved.

Future research avenues include:

Deployment and benchmarking at operational intersections, including end-to-end latency and reliability measurement of SDSM and PSM messaging.
Regional replication beyond the German sample, leveraging new traffic exposure datasets.
Longitudinal causal effect estimation using instrumented deployments and policy impacts.
Trajectory-level analytics with full object identification.
Edge integration of LLM-driven scene reasoning to handle non-normative VRU behaviors.

Conclusion

This paper presents a comprehensive, empirically validated framework for adaptive roadside VRU safety at signalized intersections, integrating multi-modal sensing, edge analytics, V2X/P2X communication, and adaptive phase control. Quantitative analysis on the R-LiVIT dataset substantiates each architectural decision and highlights the criticality of multi-modal, context-sensitive, and real-time approaches. This system serves as a concrete foundation for next-generation intelligent intersection deployments, supporting both operational objectives and future research at the intersection of urban informatics, edge AI, and cooperative transport safety.

Markdown Report Issue