- The paper introduces VRUD, a high-resolution drone dataset that captures complex vehicle-VRU interactions in unstructured, high-density urban settings.
- It employs advanced preprocessing techniques like multi-video alignment and trajectory extraction using YOLO11x-OBB and ByteTrack to ensure spatial-temporal precision.
- Key findings reveal a critical 0.7s VTTC interaction window and an inverse relationship between vehicle speed and VRU density, offering actionable insights for ADS development.
VRUD: A Drone-Based Dataset for Vehicle-VRU Interactions in Mixed Urban Traffic
Introduction
The VRUD dataset presents a high-resolution, drone-based collection of vehicle and vulnerable road user (VRU) trajectories from complex, unstructured urban “village” environments in Shenzhen, China. This work directly confronts the conspicuous data scarcity for high-density, unpredictable traffic domains where VRUs overwhelmingly dominate road user composition and canonical vehicle-centric datasets exhibit limited utility. The VRUD corpus incorporates over 4 hours of 4K/30Hz aerial video, yielding 11,479 VRU and 1,939 vehicle trajectories, with VRUs constituting approximately 87% of all agents. The dataset is uniquely tailored for the development and evaluation of robust, interaction-aware autonomous driving systems (ADS) operating beyond structured scenarios.
Methodology and Data Properties
Data Acquisition and Preprocessing
Drone-based acquisition took place during morning and evening urban peak hours, specifically selecting environments devoid of fixed surveillance and marked by high occlusion and heterogeneity. Key settings included a flight altitude of 80 meters and image acquisition at 4K/30fps, ensuring high fidelity for small dynamic agents (e.g., pedestrians, cyclists, motorcycles). Privacy is intrinsically secured by operating altitude, obfuscating personally identifiable details without explicit redaction.
Preprocessing is performed in two primary stages:
Trajectory Extraction and Validation
Detection of seven traffic categories (pedestrians, bicycles, motorcycles, tricycles, cars, buses, trucks) is achieved using a YOLO11x-OBB model, trained on 3,000 manually labeled images. Perspective rectification is handled via an L-shape method, factoring aerial geometric distortion. Tracking is conducted using ByteTrack, with parameter tuning to optimize trajectory continuity for small and frequently occluded agents.
Empirical validation employs comparisons to RT ground-truth navigation systems, confirming high-precision spatial-temporal correspondence for critical vehicle dynamics.
Scenario Definition and Interaction Labeling
Each interaction scenario is ego-centric: for every motorized agent, all spatio-temporally proximal VRUs are identified per frame. The interaction intensity is formalized by introducing Vector Time to Collision (VTTC)—a surrogate safety measure extending classic TTC by inferring future conflict at the Closest Point of Approach, using full state vectors (relative position and velocity).
Figure 2: Multi-agent interaction scenario indicating ego-vehicle (ID 3913) and prioritized critical targets for navigation.
Dataset Statistics and Comparative Analysis
User Composition and Motion Patterns
VRUD is distinguished by its extreme VRU prevalence (87%), surpassing all major benchmarks. Pedestrians and motorcycles are especially dominant, with motorcycles exhibiting the greatest average velocities—underscoring the unique traffic efficiency they achieve in these unregulated environments.
Figure 3: Categorical and velocity statistics—pedestrians, motorcycles predominate; motorcycles possess highest mean speed.
Comparative analysis with datasets such as INTERACTION, SIND, and inD demonstrates that VRUD captures nearly half or more of its volume from VRUs, in stark contrast to vehicular-dominant benchmarking sets.
Figure 4: Positive correlation between VRU count, scenario complexity, and mean VTTC, substantiating the impact of VRU density on interaction dynamics.
Behavioral Characterization via VTTC
Of ∼5,000 initial interaction segments, 4,002 scenarios meet strict interaction relevance (VTTC upper quartile threshold of 1.53s, with strong intensity convergence at 0.7s). This 0.7s temporal filter is empirically derived as a high-value demarcation capturing the critical “negotiation” window underpinning real-world vehicle-VRU interaction, rather than purely collision-avoidance.
Figure 5: Ego-speed distribution reflects quasi-normal range, highlighting low-speed maneuvering typical in dense VRU urban traffic.
Figure 6: Interaction intensity analysis reveals strong coupling around a 0.7s VTTC threshold, optimizing high-relevance sample selection.
Significant findings include:
- Vehicle velocity exhibits an inverse correlation with VRU count—ego speed declines as interaction density rises.
- Behavioral data supports the emergence of a universal “negotiation corridor,” with vehicles self-regulating within tactical speed regimes.
Implications and Future Directions
VRUD’s architecture and data structure provide immediate integration points for rigorous ADS research, including but not limited to:
- Trajectory prediction: High-variance, high-density VRU movement data enhances model robustness under “long-tail” edge cases.
- Behavioral intent recognition: Rich annotations and spatio-temporal coupling support fine-grained intent inference and scenario simulation.
- End-to-end planning/safety validation: VTTC-labeled scenarios facilitate risk-critical module evaluation, with the 0.7s filter supporting automated hard case mining.
Practically, VRUD bridges the gap left by prior structured datasets, enabling downstream research to address the true operational design domain limitations encountered in global urban deployments. Theoretical advances are also expected in surrogate safety measure calibration and adaptive planning policies for agent negotiation in highly uncertain state spaces.
Future development should include expansion to diverse geographies, integration with sensor fusion data (e.g., ground LIDAR or AV-mounted sensors for cross-modal validation), and the progressive automation of behavioral annotation leveraging advancements in relational deep learning.
Conclusion
VRUD is a comprehensive, open-source drone-derived dataset explicitly confronting the limitations of current vehicular trajectory corpora for unstructured urban scenarios. Its scale, resolution, and scenario extraction protocols position it as a pivotal resource for progressing socially compliant, risk-aware ADS research and robust model validation. The high-frequency, multi-agent labeled trajectories and empirically grounded interaction filters lay a robust foundation for the next phase of simulation-driven, safety-critical autonomous vehicle system development.